CH-Smote Algorithm: A Novel Approach to Improve Random Forest Classification on Class Imbalanced Datasets
Keywords:
Random Forest, Imbalanced data, SMOTE algorithm, ClassificationAbstract
The Random Forest algorithm is widely recognized for its high prediction accuracy, robustness to noise, flexibility in parameter tuning, adaptability, and its ability to mitigate over-fitting across various fields. However, its performance degrades significantly when applied imbalanced datasets, often failing to achieve adequate classification accuracy. Although numerous techniques have been proposed in previous research to address this problem, many are computationally complex and tend to introduce additional noise. In contrast, sample generation techniques, are more widely employed than direct modifications to the classification algorithm. Therefore, this study proposes a novel hybrid sampling technique, termed the CH-SMOTE algorithm, which integrates the center of gravity principle with the SMOTE algorithm, and combines both over-sampling and under-sampling methods.. This algorithm is designed to be both computationally straightforward and highly effective. The CH-SMOTE algorithm addresses key limitations of the original SMOTE algorithm, such as blind synthesis and marginalization issues, while simultaneously mitigating over fitting and effectively handling class imbalance. To demonstrate its effectiveness, the CH-SMOTE algorithm was evaluated on seventeen datasets exhibiting varying degrees of class imbalance. The results indicate that the CH-SMOTE algorithm significantly enhances the classification performance of the Random Forest on imbalanced datasets.