A Novel Approach for Handling Imbalanced Data in Medical Diagnosis using Undersampling Technique

Varsha Babar; Roshani Ade

Call for Paper

September Edition

CAE solicits high quality original research papers for the upcoming September edition of the journal. The last date of research paper submission is 28 August 2025

Submit your paper

Know more

The week's pick

Property (Land) Registration Management Using Blockchain in Nigeria

A. Festus Osuolale Ojebiyi David-Daniel Seriki Oluwasogo

Random Articles

Reseach Article

A Novel Approach for Handling Imbalanced Data in Medical Diagnosis using Undersampling Technique

by Varsha Babar, Roshani Ade

Communications on Applied Electronics

Foundation of Computer Science (FCS), NY, USA

Volume 5 - Number 7

Year of Publication: 2016

Authors: Varsha Babar, Roshani Ade

10.5120/cae2016652323

Varsha Babar, Roshani Ade . A Novel Approach for Handling Imbalanced Data in Medical Diagnosis using Undersampling Technique. Communications on Applied Electronics. 5, 7 ( Jul 2016), 36-42. DOI=10.5120/cae2016652323

@article{ 10.5120/cae2016652323,

author = { Varsha Babar, Roshani Ade },

title = { A Novel Approach for Handling Imbalanced Data in Medical Diagnosis using Undersampling Technique },

journal = { Communications on Applied Electronics },

issue_date = { Jul 2016 },

volume = { 5 },

number = { 7 },

month = { Jul },

year = { 2016 },

issn = { 2394-4714 },

pages = { 36-42 },

numpages = {9},

url = { https://www.caeaccess.org/archives/volume5/number7/635-2016652323/ },

doi = { 10.5120/cae2016652323 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2023-09-04T19:55:00.624581+05:30

%A Varsha Babar

%A Roshani Ade

%T A Novel Approach for Handling Imbalanced Data in Medical Diagnosis using Undersampling Technique

%J Communications on Applied Electronics

%@ 2394-4714

%V 5

%N 7

%P 36-42

%D 2016

%I Foundation of Computer Science (FCS), NY, USA

Abstract

In many data mining applications the imbalanced learning problem is becoming ubiquitous nowadays. When the data sets have an unequal distribution of samples among classes, then these data sets are known as imbalanced data sets. When such highly imbalanced data sets are given to any classifier, then classifier may misclassify the rare samples from the minority class. To deal with such type of imbalance, several undersampling as well as oversampling methods were proposed. Many undersampling techniques do not consider distribution of information among the classes, similarly some oversampling techniques lead to the overfitting or may cause overgeneralization problem. This paper proposes an MLP-based undersampling technique (MLPUS) which will preserve the distribution of information while doing undersampling. This technique uses stochastic measure evaluation for identifying important samples from the majority as well as minority samples. Experiments are performed on 5 real world data sets for the evaluation of performance of proposed work.

References

H. He and E.A. Garcia, Learning from Imbalanced Data, IEEE Trans. Knowledge Data Eng., vol. 21, no. 9, pp. 1263-1284, Sept. 2009.
X.Y. Liu, J.Wu, and Z.H. Zhou, Exploratory Under Sampling for Class Imbalance Learning, Proc. Intl Conf. Data Mining, pp. 965- 969, 2006.
J. Zhang and I. Mani, KNN Approach to Unbalanced Data Distributions: A Case Study Involving Information Extraction, Proc. Intl Conf. Machine Learning, Workshop Learning from Imbalanced Data Sets, 2003.
M. Kubat and S. Matwin, Addressing the Curse of Imbalanced Training Sets: One-Sided Selection, Proc. Intl Conf. Machine Learning, pp. 179-186, 1997.
Victor H. Barella, Eduardo p. Costa, and Andre C P L F Carvalho, ClusterOSS: a new undersampling method for imbalanced learning
Wing W. Y. Ng, Junjie Hu, Daniel S. Yeung, Shaohua Yin, and Fabio Roli, ”Diversified Sensitivity-Based Undersampling for Imbalance Classification Problems”, IEEE Trans. Cybernetics vol. 45, no. 11, Nov. 2015.
H.He, Self-Adaptive Systems for Machine Intelligence,Wiley, Aug 2011
N.V. Chawla, K.W. Bowyer, L.O. Hall, and W.P. Kegelmeyer, “SMOTE: Synthetic Minority oversampling Technique”,J. Artificial Intelligence Research, vol. 16, pp. 321-357, 2002.
H. Han, W.Y. Wang, and B.H. Mao, “Borderline-SMOTE: A New Oversampling Method in Imbalanced Data Sets Learning”, Proc. Intl Conf. Intelligent Computing, pp. 878-887, 2005.
C. Bunkhumpornpat, K. Sinapiromsaran, and C. Lursinsap, “Safe-level-SMOTE: Safe level-synthetic minority over-sampling technique for handling the class imbalanced problem,” in Advances in Knowledge Discovery and Data Mining. Berlin, Germany: Springer, 2009, pp. 475 482, 2009.
T. Maciejewski and J. Stefanowski, “Local neighbourhood extension of SMOTE for mining imbalanced data,” in Proc. IEEE Symp. Comput. Intell. Data Min. (CIDM), Paris, France, pp. 104111, 2011.
E. Ramentol, Y. Caballero, R. Bello, and F. Herrera, “SMOTE-RSB*: A hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory,” Knowl. Inf. Syst., vol. 33, no. 2, pp. 245265, 2012.
Reshma C. Bhagat and Sachin S. Patil, “Enhanced SMOTE Algorithm for Classification of Imbalanced Big-Data using Random Forest”, IEEE International Advance Computing Conference (IACC), 2015.
H. He, Y. Bai, E.A. Garcia, and S. Li, “ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning”, Proc. Intl Joint Conf. Neural Networks, pp. 1322-1328, 2008.
S. Chen, H. He, and E.A. Garcia, “RAMOBoost: Ranked Minority Oversampling in Boosting”, IEEE Trans. Neural Networks, vol. 21, no. 20, pp. 1624-1642, Oct. 2010.
Ade, Roshani, and P. R. Deshmukh. "Instance-based vs Batch-based Incremental Learning Approach for Students Classification." International Journal of Computer Applications 106.3 (2014).
Ade, Roshani, and Prashant Deshmukh. "Efficient knowledge transformation for incremental learning and detection of new concept class in student’s classification system." Information Systems Design and Intelligent Applications. Springer India, 2015. 757-766.
Kulkarni, Pallavi Digambarrao and Roshani Ade. "Learning from Unbalanced Stream Data in Non-Stationary Environments Using Logistic Regression Model: A Novel Approach Using Machine Learning for Assessment of Credit Card Frauds." Handbook of Research on Natural Computing for Optimization Problems. IGI Global, 2016. 561-582. Web. 9 Jun. 2016. doi:10.4018/978-1-5225-0058-2.ch023
D. S. Yeung, W. W. Y. Ng, D. Wang, E. C. Tsang, and X.-Z. Wang, “Localized generalization error model and its application to architecture selection for radial basis function neural network,” IEEE Trans. Neural Netw., vol. 18, no. 5, pp. 1294–1305, Sep. 2007.
B. Sun, W. W. Y. Ng, D. S. Yeung, and P. P. K. Chan, “Hyper-parameter selection for sparse LS-SVM via minimization of its localized generalization error,” Int. J. Wavelets Multiresolut. Inf. Process., vol. 11, no. 3, 2013, Art. ID 1350.

Index Terms

Computer Science

Information Sciences

Keywords

Imbalanced Learning Undersampling Oversampling Clustering.