Call for Paper
CAE solicits original research papers for the April 2023 Edition. Last date of manuscript submission is March 31, 2023.
Data Deduplication: Its Significant Effect on Network Intrusion Dataset
Aladesote O Isaiah and Adetunji A Ademola. Data Deduplication: Its Significant Effect on Network Intrusion Dataset. Communications on Applied Electronics 7(32):21-26, December 2019. BibTeX
@article{10.5120/cae2019652845, author = {Aladesote O. Isaiah and Adetunji A. Ademola}, title = {Data Deduplication: Its Significant Effect on Network Intrusion Dataset}, journal = {Communications on Applied Electronics}, issue_date = {December 2019}, volume = {7}, number = {32}, month = {Dec}, year = {2019}, issn = {2394-4714}, pages = {21-26}, numpages = {6}, url = {http://www.caeaccess.org/archives/volume7/number32/864-2019652845}, doi = {10.5120/cae2019652845}, publisher = {Foundation of Computer Science (FCS), NY, USA}, address = {New York, USA} }
Abstract
This research work adopted future extraction techniques on NSL KDD data set, using deduplication software written in C++ Programming Language, duplicated records of four attack types (DOS, R2L, Robing and U2R) were removed. Among the attack types for DOS, Mailbomb with 98.63% has highest percentage reduction rate while Apache2 with 40.30% reduction rate has the least. For R2L, Smpgetattack with 92.70% reduction has the highest while there was no reduction for Ftp_write. With 93.15% reduction, Nmap has the highest reduction rate under Probing attack while Mscan with 60.84% reduction rate has the least while 50% reduction rate for Sqlattack is the highest for U2R attack type. Wilcoxon Sign test is used to test for the significance of the deduplication and results revealed that all the attack types except U2R have significant reduction rate at 5% level.
References
- Aladesote O., Alese, K. & Dahunsi F. 2014. Intrusion Detection System using Hypothesis Testing. Proceedings of the World Congress on Engineering and Computer Science (WCECS) vol. I, 22-24.
- Amudha, P., Karthik & Sivakumari 2015. A Hybrid Swarm Intelligence Algorithm for Intrusion Detection Using Significant Feature. The Scientific World Journal, vol. 2015.
- Devi, R. & Thigarasu, V. 2014. A Novel Approach for Record Deduplication using Hidden Markov Model (HMM). International Journal of Computer Science and Information Technologies. 5(6), 8070 – 8073.
- Dirk M. 2013. Advanced Data Deduplication Technique and their Application. Dissertation Submitted at the Department of Mathematics & Informatics, Johannes Gutenberg University Mainz.
- Farid, Daramont, Harbi, et al., 2009. Adaptive Network Intrusion Detection Learning: Attribute Selection & Classification. International Journal of Computer and Information Engineering 3(12), 2009.
- Jaiganesh, V., Sumathi, D. & Mangayarkarasi, S. 2013. An Analysis of Intrusion Detection System using Back Propagation Neural Network. IEEE Computer Society Publication 2013.
- Jiang, Y., Lin, C., Meng, W. et al, 2014. Rule-based deduplication of article records from bibliographic databases. Database. Vol. 2014.
- Prajowal, M. 2014. A Practical Approach to Anomaly based Intrusion Detection System by Outlier Mining in Network Traffic. A Thesis Presented to the Masdar Institute of Science and Technology in Partial Fulfilment of the Requirements for the Degree of Master of Science in Computing and Information Science.
- Shona D. & Senthilkumar, 2016. An Ensemble Data Preprocessing Approach for Intrusion Detection System using Variant Firefly & BK-NN Techniques. International Journal of Applied Engineering Research, 11(6), 4161 – 4166.
Keywords
Deduplication, extraction techniques, attack types, Wilcoxon sign test, NSL-KDD