Article
Open Access
Expand
Imbalanced data handling techniques for classification: a state-of-the-art review
1 Faculty of Computing, Universiti Teknologi Malaysia, Johor, Malaysia
2 Forman Christian College (A Chartered university), Lahore, Pakistan
3 Kinnaird College for Women, Lahore, Pakistan
  • Volume
  • Citation
    Basharat A, Ali A, Mughal H, Mohamad MMB. Imbalanced data handling techniques for classification: a state-of-the-art review. Proc. Comput. Sci. 2023(1):0010, https://doi.org/10.55092/pcs2023020010. 
  • DOI
    10.55092/pcs2023020010
  • Copyright
    Copyright2023 by the authors. Published by ELSP.
Abstract

Imbalanced data in one of the major problems faced by Machine learning and deep learning classifiers. The skewness in the data distribution limits the performance of classifiers. This leads to overfitting of the model and misclassification for minority classes. Researchers have been focused on new techniques to balance data by oversampling minority classes, under sampling majority classes or creating a hybrid of oversampling and under sampling. Over the years researchers have also explored algorithmic techniques to adjust weights, create bags of classes and optimally enhance the data. This paper provides a stateof-the-art review of the latest contributions to resolve the imbalance data problem. The major focus of this paper is on the hybrid techniques, ensemble methods and GAN-based data augmentation techniques

Keywords

Imbalance data; ensemble methods; data augmentation; generative adversarial networks

Preview