CH-Smote Algorithm: A Novel Approach to Improve Random Forest Classification on Class Imbalanced Datasets

Authors

  • Jiao Wang School of Mathematical Sciences, UniversitiSains Malaysia, 11800 USM, Penang, Malaysia
  • Norhashidah Awang School of Mathematical Sciences, UniversitiSains Malaysia, 11800 USM, Penang, Malaysia

Keywords:

Random Forest, Imbalanced data, SMOTE algorithm, Classification

Abstract

The Random Forest algorithm is widely recognized for its high prediction accuracy, robustness to noise, flexibility in parameter tuning, adaptability, and its ability to mitigate over-fitting across various fields. However, its performance degrades significantly when applied imbalanced datasets, often failing to achieve adequate classification accuracy. Although numerous techniques have been proposed in previous research to address this problem, many are computationally complex and tend to introduce additional noise. In contrast, sample generation techniques, are more  widely employed than direct modifications to the classification algorithm. Therefore, this study proposes a novel hybrid sampling technique, termed the CH-SMOTE algorithm, which integrates the center of gravity principle with the SMOTE algorithm, and combines both over-sampling and under-sampling methods.. This algorithm is designed to be both computationally straightforward and highly effective. The CH-SMOTE algorithm addresses key limitations of the original SMOTE algorithm, such as blind synthesis and marginalization issues, while simultaneously mitigating over fitting and effectively handling class imbalance. To demonstrate its effectiveness, the CH-SMOTE algorithm was evaluated on seventeen datasets exhibiting varying degrees of class imbalance. The results indicate that the CH-SMOTE algorithm significantly enhances the classification performance of the Random Forest on imbalanced datasets.

Downloads

Published

2024-09-20

How to Cite

Jiao Wang, & Norhashidah Awang. (2024). CH-Smote Algorithm: A Novel Approach to Improve Random Forest Classification on Class Imbalanced Datasets. Journal of Computational Analysis and Applications (JoCAAA), 33(07), 1331–1349. Retrieved from http://eudoxuspress.com/index.php/pub/article/view/1234

Issue

Section

Articles

Similar Articles

1 2 3 4 5 6 7 8 9 10 > >> 

You may also start an advanced similarity search for this article.