Imbalanced data handling techniques for classification: a state-of-the-art review

Asma Basharat; Amna Ali; Huma Mughal; Mohd Murtadha Bin Mohamad

doi:10.55092/pcs2023020010

Article

Open Access

Expand

Imbalanced data handling techniques for classification: a state-of-the-art review

download PDF

Asma Basharat^1,²^,∗, Amna Ali³, Huma Mughal³, Mohd Murtadha Bin Mohamad¹

¹ Faculty of Computing, Universiti Teknologi Malaysia, Johor, Malaysia

² Forman Christian College (A Chartered university), Lahore, Pakistan

³ Kinnaird College for Women, Lahore, Pakistan

* asmabasharat@fccollege.edu.pk

Volume
Volume 1, 2023
Citation
Basharat A, Ali A, Mughal H, Mohamad MMB. Imbalanced data handling techniques for classification: a state-of-the-art review. Proc. Comput. Sci. 2023(1):0010, https://doi.org/10.55092/pcs2023020010.
DOI
10.55092/pcs2023020010
Copyright
Copyright2023 by the authors. Published by ELSP.

Abstract

Imbalanced data in one of the major problems faced by Machine learning and deep learning classifiers. The skewness in the data distribution limits the performance of classifiers. This leads to overfitting of the model and misclassification for minority classes. Researchers have been focused on new techniques to balance data by oversampling minority classes, under sampling majority classes or creating a hybrid of oversampling and under sampling. Over the years researchers have also explored algorithmic techniques to adjust weights, create bags of classes and optimally enhance the data. This paper provides a stateof-the-art review of the latest contributions to resolve the imbalance data problem. The major focus of this paper is on the hybrid techniques, ensemble methods and GAN-based data augmentation techniques

Keywords

Imbalance data; ensemble methods; data augmentation; generative adversarial networks

Preview

view pdf