CFP last date
02 March 2026
Call for Paper
April Edition
CAE solicits high quality original research papers for the upcoming April edition of the journal. The last date of research paper submission is 02 March 2026

Submit your paper
Know more
Random Articles
Reseach Article

Analyzing Word Frequency and Predictive Patterns in AI-Generated Essays

by Henry Sanmi Makinde, Akindeji Ibrahim Makinde, Mutiyat Adeola Usman, Hope Adegoke
Communications on Applied Electronics
Foundation of Computer Science (FCS), NY, USA
Volume 8 - Number 1
Year of Publication: 2026
Authors: Henry Sanmi Makinde, Akindeji Ibrahim Makinde, Mutiyat Adeola Usman, Hope Adegoke
10.5120/cae2026652920

Henry Sanmi Makinde, Akindeji Ibrahim Makinde, Mutiyat Adeola Usman, Hope Adegoke . Analyzing Word Frequency and Predictive Patterns in AI-Generated Essays. Communications on Applied Electronics. 8, 1 ( Jan 2026), 73-85. DOI=10.5120/cae2026652920

@article{ 10.5120/cae2026652920,
author = { Henry Sanmi Makinde, Akindeji Ibrahim Makinde, Mutiyat Adeola Usman, Hope Adegoke },
title = { Analyzing Word Frequency and Predictive Patterns in AI-Generated Essays },
journal = { Communications on Applied Electronics },
issue_date = { Jan 2026 },
volume = { 8 },
number = { 1 },
month = { Jan },
year = { 2026 },
issn = { 2394-4714 },
pages = { 73-85 },
numpages = {9},
url = { https://www.caeaccess.org/archives/volume8/number1/analyzing-word-frequency-and-predictive-patterns-in-ai-generated-essays/ },
doi = { 10.5120/cae2026652920 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2026-02-01T01:49:15.733366+05:30
%A Henry Sanmi Makinde
%A Akindeji Ibrahim Makinde
%A Mutiyat Adeola Usman
%A Hope Adegoke
%T Analyzing Word Frequency and Predictive Patterns in AI-Generated Essays
%J Communications on Applied Electronics
%@ 2394-4714
%V 8
%N 1
%P 73-85
%D 2026
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Artificial Intelligence (AI) has dramatically transformed various aspects of human life and activities, including the composition of essays and texts. AI technologies have enabled computers to generate text that closely resembles human writing and this has raised concerns with implications for academic integrity, creative authenticity, and professional communication. This study aim to investigates the linguistic characteristics and predictive mechanisms underlying AI-generated essays, aiming to identify markers that distinguish them from human-authored texts. 1,000 essays with diverse topics and writing styles were generated using ChatGPT, DeepSeek, and Gemini and a comparable corpus of human-written essays were also collected from publicly available sources. The research work used natural language processing (NLP) techniques and machine learning models to analyze word frequency, next-word prediction patterns, and stylistic elements in a corpus of AI-generated and human-written essays.The results show that the temperature settings in AI models significantly influence word selection, with higher temperatures increasing randomness and reducing the likelihood of predictable word choices. Machine learning classification using Support Vector Machines (SVM) of 98% and Random Forests of 95.75% achieved high accuracy in differentiating between AI and human essays, highlighting the effectiveness of linguistic features for automated detection. The study concludes that AI-generated content can be reliably distinguished from human writing using stylistic and lexical features, contributing to the development of more reliable AI assessment tools and a better understanding of NLP model behavior.

References
  1. Tang, R., Chuang, Y. N., & Hu, X. (2024). The science of detecting LLM-generated text. Communications of the ACM, 67(4), 50-59.
  2. Logacheva, E., Hellas, A., Prather, J., Sarsa, S., & Leinonen, J. (2024). Evaluating Contextually Personalized Programming Exercises Created with Generative AI. arXiv preprint arXiv:2407.11994. https://doi.org/10.1145/3632620.3671103
  3. Javaid, M., Haleem, A., Singh, R. P., Khan, S., & Khan, I. H. (2023). Unlocking the opportunities through ChatGPT Tool towards ameliorating the education system. BenchCouncil Transactions on Benchmarks, Standards and Evaluations, 3(2), 100115. https://doi.org/10.1016/j.tbench.2023.100115
  4. Draxler, F., Werner, A., Lehmann, F., Hoppe, M., Schmidt, A., Buschek, D., & Welsch, R. (2024). The AI ghostwriter effect: When users do not perceive ownership of AI-generated text but self-declare as authors. ACM Transactions on Computer-Human Interaction, 31(2), 1-40. https://doi.org/10.1145/3637875
  5. Dergaa, I., Chamari, K., Zmijewski, P., & Saad, H. B. (2023). Fromhuman writing to artificial intelligence generated text: Examiningthe prospects and potential threats of ChatGPT in academic writ-ing. Biology of Sport, 40(2), 615–622
  6. Roberto, C., & Sebastian, L. A. One-Class Learning for AI-Generated Essay Detection (2023). : Corizzo, R.; Leal-Arenas, S. One-Class Learning for AI-Generated Essay Detection. Appl. Sci. 2023, 13, 7901. Hz
  7. Melliti, M. (2024). Using Genre Analysis to Detect AI-Generated Academic Texts. Diá-logos, 16(29), 09-27.
  8. Akinwande, M., Adeliyi, O., & Yussuph, T. (2024). Decoding AI and Human Authorship: Nuances Revealed Through NLP and Statistical Analysis. International Journal of Cybernetics and Informatics. Vol. 13(4): 85-103
  9. Moreno A. and Redondo T. (2016). Text Analytics: the convergence of Big Data and Artificial Intelligence. IJIMAI 3, 6 (2016), 57–64.
  10. Shah, A., Ranka, P., Dedhia, U., Prasad, S., Muni, S., & Bhowmick, K. (2023). Detecting and unmasking AI-generated texts through explainable artificial intelligence using stylistic features. International Journal of Advanced Computer Science and Applications, 14(10) 1043-1053
  11. Gray, A. (2024). ChatGPT" contamination": estimating the prevalence ofLLMs in the s cholarly literature. arXiv preprint arXiv. 2403.16887
  12. Comas‐Forgas, R., Koulouris, A., & Kouis, D. (2025). ‘AI‐navigating’or ‘AI‐sinking’? An analysis of verbs in research articles titles suspicious of containing AI‐generated/assisted content. Learned Publishing, 38(1), 1-11.
  13. Brahma, M., Karthika, N. J., Singh, A., Adiga, D., Bhate, S., Ramakrishnan, G., Saluja, R., & Desarkar, M. S. (2025). MorphTok: Morphologically Grounded Tokenization for Indian Languages. arXiv preprint arXiv:2504.10335. https://doi.org/10.48550/arXiv.2504.10335
  14. Pattnayak, P., Patel, H. L., & Agarwal, A. (2025). Tokenization Matters: Improving Zero-Shot NER for Indic Languages. arXiv preprint arXiv:2504.16977. https://doi.org/10.48550/arXiv.2504.16977
  15. Raj, B. S., Suri, G., Dewangan, V., & Sonavane, R. (2024). When Every Token Counts: Optimal Segmentation for Low-Resource Language Models. arXiv preprint arXiv:2412.06926. https://doi.org/10.48550/arXiv.2412.06926
  16. Aida, T., & Bollegala, D. (2025). Investigating the Contextualised Word Embedding Dimensions Specified for Contextual and Temporal Semantic Changes. In Proceedings of the 31st International Conference on Computational Linguistics (pp. 1413–1437). Association for Computational Linguistics. https://doi.org/10.48550/arXiv.2407.02820
  17. Worth, P. J. (2023). Word Embeddings and Semantic Spaces in Natural Language Processing. International Journal of Intelligence Science, 13(1), 1–21. https://doi.org/10.4236/ijis.2023.131001
  18. Palominos, C., He, R., Fröhlich, K., Mülfarth, R. R., Seuffert, S., Sommer, I. E., Homan, P., Kircher, T., Stein, F & Hinzen, W. (2024). Approximating the semantic space: word embedding techniques in psychiatric speech analysis. Schizophrenia, 10(1), 1-10.,
  19. Worth, P. J. (2023). Word Embeddings and Semantic Spaces in Natural Language Processing. International Journal of Intelligence Science, 13(1), 1–21. https://doi.org/10.4236/ijis.2023.131001
  20. Zhou, J., Liu, C., Duan, N., & Li, M. (2022). An Overview of Pretrained Language Models for Natural Language Processing. AI Open, 3, 9–28. https://doi.org/10.1016/j.aiopen.2021.12.001
  21. OpenAI. (2023). GPT-4 Technical Report. https://doi.org/10.48550/arXiv.2303.08774
  22. Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Amodei, D. (2023). Language models are few-shot learners. Communications of the ACM, 66(5), 108–117. https://doi.org/10.1145/3571991
  23. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of NAACL-HLT 2019. https://doi.org/10.48550/arXiv.1810.04805
  24. Wang, A., Zhang, Y., Liu, J., & Bowman, S. R. (2023). Evaluating Pretrained Transformers for Natural Language Understanding. Transactions of the Association for Computational Linguistics, 11, 245–261. https://doi.org/10.1162/tacl_a_00559
  25. Oancea, B. (2025). Text classification using machine learning methods. arXiv preprint arXiv:2502.19801. https://doi.org/10.48550/arXiv.2502.19801
  26. Abia, V. M., & Johnson, E. H. (2024). Sentiment Analysis Techniques: A Comparative Study of Logistic Regression, Random Forest, and Naive Bayes on General English and Nigerian Texts. Journal of Engineering Research and Reports, 26(9), 123–135. https://doi.org/10.9734/jerr/2024/v26i91268
  27. Shijaku, E., & Canhasi, E. (2024). Classification of human- and AI-generated texts for different languages and domains. International Journal of Speech Technology. https://doi.org/10.1007/s10772-024-10143-3
  28. Sanchez-Medina, J. J. (2024). Sentiment analysis and random forest to classify LLM versus human source applied to Scientific Texts. arXiv preprint arXiv:2404.08673. https://doi.org/10.48550/arXiv.2404.08673
  29. Makinde, H. S., Makinde, A. I., Usman, M. A., Adegoke, H., Makinde-Isola, B. A., Lawal, W., & Jimoh, I. T. The Readability Paradox: Can We Trust Decisions on AI Detectors? Technium Education and Humanities, 11, 181-195.
  30. Krawczyk, N., Probierz, B., & Kozak, J. (2024). Towards AI-Generated Essay Classification Using Numerical Text Representation. Applied Sciences, 14(21), 1-23.
Index Terms

Computer Science
Information Sciences

Keywords

Predictive Patterns AI-generated essays DeepSeek ChatGPT Gemini Machine learning Analyzing word frequency