CFP last date
01 May 2024
Reseach Article

Data Deduplication: Its Significant Effect on Network Intrusion Dataset

by Aladesote O. Isaiah, Adetunji A. Ademola
Communications on Applied Electronics
Foundation of Computer Science (FCS), NY, USA
Volume 7 - Number 32
Year of Publication: 2019
Authors: Aladesote O. Isaiah, Adetunji A. Ademola

Aladesote O. Isaiah, Adetunji A. Ademola . Data Deduplication: Its Significant Effect on Network Intrusion Dataset. Communications on Applied Electronics. 7, 32 ( Dec 2019), 21-26. DOI=10.5120/cae2019652845

@article{ 10.5120/cae2019652845,
author = { Aladesote O. Isaiah, Adetunji A. Ademola },
title = { Data Deduplication: Its Significant Effect on Network Intrusion Dataset },
journal = { Communications on Applied Electronics },
issue_date = { Dec 2019 },
volume = { 7 },
number = { 32 },
month = { Dec },
year = { 2019 },
issn = { 2394-4714 },
pages = { 21-26 },
numpages = {9},
url = { },
doi = { 10.5120/cae2019652845 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
%0 Journal Article
%1 2023-09-04T20:02:40.587907+05:30
%A Aladesote O. Isaiah
%A Adetunji A. Ademola
%T Data Deduplication: Its Significant Effect on Network Intrusion Dataset
%J Communications on Applied Electronics
%@ 2394-4714
%V 7
%N 32
%P 21-26
%D 2019
%I Foundation of Computer Science (FCS), NY, USA

This research work adopted future extraction techniques on NSL KDD data set, using deduplication software written in C++ Programming Language, duplicated records of four attack types (DOS, R2L, Robing and U2R) were removed. Among the attack types for DOS, Mailbomb with 98.63% has highest percentage reduction rate while Apache2 with 40.30% reduction rate has the least. For R2L, Smpgetattack with 92.70% reduction has the highest while there was no reduction for Ftp_write. With 93.15% reduction, Nmap has the highest reduction rate under Probing attack while Mscan with 60.84% reduction rate has the least while 50% reduction rate for Sqlattack is the highest for U2R attack type. Wilcoxon Sign test is used to test for the significance of the deduplication and results revealed that all the attack types except U2R have significant reduction rate at 5% level.

  1. Aladesote O., Alese, K. & Dahunsi F. 2014. Intrusion Detection System using Hypothesis Testing. Proceedings of the World Congress on Engineering and Computer Science (WCECS) vol. I, 22-24.
  2. Amudha, P., Karthik & Sivakumari 2015. A Hybrid Swarm Intelligence Algorithm for Intrusion Detection Using Significant Feature. The Scientific World Journal, vol. 2015.
  3. Devi, R. & Thigarasu, V. 2014. A Novel Approach for Record Deduplication using Hidden Markov Model (HMM). International Journal of Computer Science and Information Technologies. 5(6), 8070 – 8073.
  4. Dirk M. 2013. Advanced Data Deduplication Technique and their Application. Dissertation Submitted at the Department of Mathematics & Informatics, Johannes Gutenberg University Mainz.
  5. Farid, Daramont, Harbi, et al., 2009. Adaptive Network Intrusion Detection Learning: Attribute Selection & Classification. International Journal of Computer and Information Engineering 3(12), 2009.
  6. Jaiganesh, V., Sumathi, D. & Mangayarkarasi, S. 2013. An Analysis of Intrusion Detection System using Back Propagation Neural Network. IEEE Computer Society Publication 2013.
  7. Jiang, Y., Lin, C., Meng, W. et al, 2014. Rule-based deduplication of article records from bibliographic databases. Database. Vol. 2014.
  8. Prajowal, M. 2014. A Practical Approach to Anomaly based Intrusion Detection System by Outlier Mining in Network Traffic. A Thesis Presented to the Masdar Institute of Science and Technology in Partial Fulfilment of the Requirements for the Degree of Master of Science in Computing and Information Science.
  9. Shona D. & Senthilkumar, 2016. An Ensemble Data Preprocessing Approach for Intrusion Detection System using Variant Firefly & BK-NN Techniques. International Journal of Applied Engineering Research, 11(6), 4161 – 4166.
Index Terms

Computer Science
Information Sciences


Deduplication extraction techniques attack types Wilcoxon sign test NSL-KDD