Call for Paper

CAE solicits original research papers for the October 2021 Edition. Last date of manuscript submission is September 30, 2021.

Read More

A New Machine Learning based Approach for Text Spam Filtering Technique

Dipmalya Sen, Chandan Das, Sarit Chakraborty. Published in Information Systems.

Communications on Applied Electronics
Year of Publication: 2017
Publisher: Foundation of Computer Science (FCS), NY, USA
Authors: Dipmalya Sen, Chandan Das, Sarit Chakraborty
10.5120/cae2017652572

Dipmalya Sen, Chandan Das and Sarit Chakraborty. A New Machine Learning based Approach for Text Spam Filtering Technique. Communications on Applied Electronics 6(10):28-34, April 2017. BibTeX

@article{10.5120/cae2017652572,
	author = {Dipmalya Sen and Chandan Das and Sarit Chakraborty},
	title = {A New Machine Learning based Approach for Text Spam Filtering Technique},
	journal = {Communications on Applied Electronics},
	issue_date = {April 2017},
	volume = {6},
	number = {10},
	month = {Apr},
	year = {2017},
	issn = {2394-4714},
	pages = {28-34},
	numpages = {7},
	url = {http://www.caeaccess.org/archives/volume6/number10/727-2017652572},
	doi = {10.5120/cae2017652572},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}
}

Abstract

Electronic mail (e-mail) has become an essential element in our daily activities in recent past. Volume of email traffic is increasing many a fold in last couple of decades. Out of all such e-mails around 80% are unwanted mails, called as unsolicited bulk email (UBE) or spam mails. With the drastic increase in the use of electronic mail, there has also been an escalation in the problem of dealing with spam mails. In spite of availability of many commercial text based spam filters, users still suffer from the problem of spam mail, which unnecessarily accumulated in their inbox.

In this work, we have proposed a spam detection algorithm based on Machine Learning approach. We have used the concept of Cumulative Weighted Sum (CWS) seeking to achieve a greater rate of accuracy in detecting spam mails. Three different techniques are also proposed for calculating CWS value. Our method is able to detect most of the spam and provides an accurate and dynamic filtration for such mails. Experimental results of our technique with different benchmark datasets are quite significant and gives much improved performance than the available text spam filters.

References

  1. . Christina V, Karpagavalli S, Suganya G, “A Study on Email Spam Filtering Techniques”, International Journal of Computer Applications (0975 – 8887) Volume 12– No.1, December 2010
  2. . Saadat Nazirova, “Survey on Spam Filtering Technique”, Communications and Network, 2011, 3, 153-160 doi:10.4236/cn.2011.33019 Published Online August 2011
  3. . Cormack G (2008) Email spam filtering: a systematic review. Found Trends InfRetr 1(4):335–455
  4. . V. Christina et al. Email Spam Filtering using Supervised Machine Learning Techniques. International Journal on Computer Science and Engineering (IJCSE) Vol. 02, No. 09, 2010, 3126-3129
  5. . S.Dhanaraj, Dr. V. Karthikeyani, “A Study on E-mail Image Spam Filtering Techniques”, Pattern Recognition, Informatics and Mobile Engineering (PRIME) February 21-22
  6. . Lamia Mohammed Ketari, Munesh Chandra, Mohammadi Akheela Khanum, “A Study of Image Spam Filtering Techniques”, 2012 Fourth International Conference on Computational Intelligence and Communication Networks
  7. . Sarit Chakraborty, Bikramadittya Mondal, “Spam Mail Filtering Technique using Different Decision Tree Classifiers through Data Mining Approach - A Comparative Performance Analysis”, International Journal of Computer Applications, 47(16):26-31, June 2012 (ISSN: 0975 – 888)
  8. . H. He and E. A. Garcia, “Learning from imbalanced data,” IEEE Transactions on knowledge and data engineering, vol. 21, no. 9, pp. 1263–1284, 2009.
  9. S. Ruggieri, “Efficient c4. 5 [classification algorithm],” IEEE transactions on knowledge and data engineering, vol. 14, no. 2, pp. 438–444, 2002
  10. . W. Feng, J. Sun, L. Zhang, C. Cao and Q. Yang, "A support vector machine based naive Bayes algorithm for spam filtering," 2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC), Las Vegas, NV, 2016, pp. 1-8
  11. . Tarek M Mahmoud, Alaa Ismail El Nashar, Tarek Abd-El-Hafeez and Marwa Khairy, “An Efficient Three-phase Email Spam Filtering Technique”, British Journal of Mathematics & Computer Science 4(9), 2014.
  12. . Pingchuan Liu and Teng-Sheng Moh ,“Content Based Spam E-mail Filtering”, 2016 International Conference on Collaboration Technologies and Systems
  13. . Ahmed Khorsi, "An Overview of Content-based Spam Filtering Techniques", Informatica, vol. 31, no. 3, October 2007, pp 269-277.
  14. . J. W. Yoon, H. Kim, and J. H. Huh, “Hybrid spam filtering for mobile communication,” computers & security, vol. 29, no. 4, pp. 446–459, 2010.
  15. . B. Sch Ikopf, S. Mika, C. Burges et al., “Input space versus feature space in kernel-based method,” IEEE Trans Neural Networks, pp. 1000–1017
  16. . Ling-Spam data set has been taken from - www.csmining.org
  17. . The Spam-mail filters used for testing in Table 3 are taken from www.fireturst.com, www.spamihilator.com

Keywords

E-mail, Spam, Ham, Machine learning, Naïve-Bayes, Cumulative-Weighted Sum