Call for Paper

CAE solicits original research papers for the July 2021 Edition. Last date of manuscript submission is June 30, 2021.

Read More

A Novel Approach for Handling Imbalanced Data in Medical Diagnosis using Undersampling Technique

Varsha Babar, Roshani Ade. Published in Biomedical.

Communications on Applied Electronics
Year of Publication: 2016
Publisher: Foundation of Computer Science (FCS), NY, USA
Authors: Varsha Babar, Roshani Ade
10.5120/cae2016652323

Varsha Babar and Roshani Ade. A Novel Approach for Handling Imbalanced Data in Medical Diagnosis using Undersampling Technique. Communications on Applied Electronics 5(7):36-42, July 2016. BibTeX

@article{10.5120/cae2016652323,
	author = {Varsha Babar and Roshani Ade},
	title = {A Novel Approach for Handling Imbalanced Data in Medical Diagnosis using Undersampling Technique},
	journal = {Communications on Applied Electronics},
	issue_date = {July 2016},
	volume = {5},
	number = {7},
	month = {Jul},
	year = {2016},
	issn = {2394-4714},
	pages = {36-42},
	numpages = {7},
	url = {http://www.caeaccess.org/archives/volume5/number7/635-2016652323},
	doi = {10.5120/cae2016652323},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}
}

Abstract

In many data mining applications the imbalanced learning problem is becoming ubiquitous nowadays. When the data sets have an unequal distribution of samples among classes, then these data sets are known as imbalanced data sets. When such highly imbalanced data sets are given to any classifier, then classifier may misclassify the rare samples from the minority class. To deal with such type of imbalance, several undersampling as well as oversampling methods were proposed. Many undersampling techniques do not consider distribution of information among the classes, similarly some oversampling techniques lead to the overfitting or may cause overgeneralization problem. This paper proposes an MLP-based undersampling technique (MLPUS) which will preserve the distribution of information while doing undersampling. This technique uses stochastic measure evaluation for identifying important samples from the majority as well as minority samples. Experiments are performed on 5 real world data sets for the evaluation of performance of proposed work.

References

  1. H. He and E.A. Garcia, Learning from Imbalanced Data, IEEE Trans. Knowledge Data Eng., vol. 21, no. 9, pp. 1263-1284, Sept. 2009.
  2. X.Y. Liu, J.Wu, and Z.H. Zhou, Exploratory Under Sampling for Class Imbalance Learning, Proc. Intl Conf. Data Mining, pp. 965- 969, 2006.
  3. J. Zhang and I. Mani, KNN Approach to Unbalanced Data Distributions: A Case Study Involving Information Extraction, Proc. Intl Conf. Machine Learning, Workshop Learning from Imbalanced Data Sets, 2003.
  4. M. Kubat and S. Matwin, Addressing the Curse of Imbalanced Training Sets: One-Sided Selection, Proc. Intl Conf. Machine Learning, pp. 179-186, 1997.
  5. Victor H. Barella, Eduardo p. Costa, and Andre C P L F Carvalho, ClusterOSS: a new undersampling method for imbalanced learning
  6. Wing W. Y. Ng, Junjie Hu, Daniel S. Yeung, Shaohua Yin, and Fabio Roli, ”Diversified Sensitivity-Based Undersampling for Imbalance Classification Problems”, IEEE Trans. Cybernetics vol. 45, no. 11, Nov. 2015.
  7. H.He, Self-Adaptive Systems for Machine Intelligence,Wiley, Aug 2011
  8. N.V. Chawla, K.W. Bowyer, L.O. Hall, and W.P. Kegelmeyer, “SMOTE: Synthetic Minority oversampling Technique”,J. Artificial Intelligence Research, vol. 16, pp. 321-357, 2002.
  9. H. Han, W.Y. Wang, and B.H. Mao, “Borderline-SMOTE: A New Oversampling Method in Imbalanced Data Sets Learning”, Proc. Intl Conf. Intelligent Computing, pp. 878-887, 2005.
  10. C. Bunkhumpornpat, K. Sinapiromsaran, and C. Lursinsap, “Safe-level-SMOTE: Safe level-synthetic minority over-sampling technique for handling the class imbalanced problem,” in Advances in Knowledge Discovery and Data Mining. Berlin, Germany: Springer, 2009, pp. 475 482, 2009.
  11. T. Maciejewski and J. Stefanowski, “Local neighbourhood extension of SMOTE for mining imbalanced data,” in Proc. IEEE Symp. Comput. Intell. Data Min. (CIDM), Paris, France, pp. 104111, 2011.
  12. E. Ramentol, Y. Caballero, R. Bello, and F. Herrera, “SMOTE-RSB*: A hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory,” Knowl. Inf. Syst., vol. 33, no. 2, pp. 245265, 2012.
  13. Reshma C. Bhagat and Sachin S. Patil, “Enhanced SMOTE Algorithm for Classification of Imbalanced Big-Data using Random Forest”, IEEE International Advance Computing Conference (IACC), 2015.
  14. H. He, Y. Bai, E.A. Garcia, and S. Li, “ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning”, Proc. Intl Joint Conf. Neural Networks, pp. 1322-1328, 2008.
  15. S. Chen, H. He, and E.A. Garcia, “RAMOBoost: Ranked Minority Oversampling in Boosting”, IEEE Trans. Neural Networks, vol. 21, no. 20, pp. 1624-1642, Oct. 2010.
  16. Ade, Roshani, and P. R. Deshmukh. "Instance-based vs Batch-based Incremental Learning Approach for Students Classification." International Journal of Computer Applications 106.3 (2014).
  17. Ade, Roshani, and Prashant Deshmukh. "Efficient knowledge transformation for incremental learning and detection of new concept class in student’s classification system." Information Systems Design and Intelligent Applications. Springer India, 2015. 757-766.
  18. Kulkarni, Pallavi Digambarrao and Roshani Ade. "Learning from Unbalanced Stream Data in Non-Stationary Environments Using Logistic Regression Model: A Novel Approach Using Machine Learning for Assessment of Credit Card Frauds." Handbook of Research on Natural Computing for Optimization Problems. IGI Global, 2016. 561-582. Web. 9 Jun. 2016. doi:10.4018/978-1-5225-0058-2.ch023
  19. D. S. Yeung, W. W. Y. Ng, D. Wang, E. C. Tsang, and X.-Z. Wang, “Localized generalization error model and its application to architecture selection for radial basis function neural network,” IEEE Trans. Neural Netw., vol. 18, no. 5, pp. 1294–1305, Sep. 2007.
  20. B. Sun, W. W. Y. Ng, D. S. Yeung, and P. P. K. Chan, “Hyper-parameter selection for sparse LS-SVM via minimization of its localized generalization error,” Int. J. Wavelets Multiresolut. Inf. Process., vol. 11, no. 3, 2013, Art. ID 1350.

Keywords

Imbalanced Learning, Undersampling, Oversampling, Clustering.