COMPARISON OF THE EFFICIENCY OF VARIOUS MACHINE LEARNING ALGORITHMS IN MULTI-CLASS TEXT CLASSIFICATION

Authors
  • Muhamediyeva D. T.

    National Research University "Tashkent Institute of Irrigation and Agricultural Mechanization Engineers", Namangan State University

    Author

  • Mamatov A. A.

    National Research University "Tashkent Institute of Irrigation and Agricultural Mechanization Engineers", Namangan State University

    Author

Keywords:
Text classification, TF-IDF, machine learning, NLP, Logistic Regression, Support Vector Machine, Naive Bayes, Random Forest, model efficiency, algorithm comparison.
Abstract

This article considers the issue of automatic classification of text data. In the study, documents were transferred to a numerical character space using TF-IDF vectorization technology based on Machine Learning and Natural Language Processing methods. The accuracy, training and testing time of Logistic Regression, Ridge Classifier, k-Nearest Neighbors, Random Forest, Linear Support Vector Machine, Stochastic Gradient Descent, Nearest Centroid and Complement Naive Bayes algorithms in the classification process were analyzed. The experiments were carried out on four categories of 20 Newsgroups text sets. The results showed that there are significant differences in accuracy and computational speed between the algorithms and proved the importance of choosing an optimal model for real-time text analysis systems.

References

1. Liu W., Quan X., Feng M., Qiu B. A short text modeling method combining semantic and statistical information // Information Sciences. – 2010. – Vol. 180, No. 20. – P. 4031–4041.

2. Kalchbrenner N., Grefenstette E., Blunsom P. A convolutional neural network for modelling sentences // Proceedings of ACL. – 2014. – P. 655–665.

3. Conneau A., Schwenk H., Barrault L., LeCun Y. Very deep convolutional networks for text classification // Proceedings of EACL. – 2017. – P. 1107 1116.

4. Lee J.Y., Dernoncourt F. Sequential short-text classification with recurrent and convolutional neural networks // Proceedings of NAACL-HLT. – 2016. – P. 515–520.

5. Zhang D., Tian L., Hong M., Han F., Ren Y., Chen Y. Combining convolution neural network and bidirectional gated recurrent unit for sentence semantic classification // IEEE Access. – 2018. – Vol. 6. – P. 73750–73759.

6. Qiu X.P., Sun T.X., Xu Y.G., Shao Y.F., Dai N., Huang X.J. Pre-trained models for natural language processing: a survey // Science China Technological Sciences. – 2020. – Vol. 63, No. 10. – P. 1872–1897.

7. Mikolov T., Chen K., Corrado G., Dean J. Efficient estimation of word representations in vector space // Proceedings of ICLR. – 2013. – P. 1–12.

8. Mikolov T., Sutskever I., Chen K., Corrado G., Dean J. Distributed representations of words and phrases and their compositionality // Advances in Neural Information Processing Systems (NIPS). – 2013. – P. 3111–3119.

9. Alsmadi I., Gan K.H. Review of short-text classification // International Journal of Web Information Systems. – 2019. – Vol. 15, No. 2. – P. 155–182. 16. Song G., Ye Y., Du X., Huang X., Bie S. Short text classification: a survey // Journal of Multimedia. – 2014. – Vol. 9, No. 5. – P. 635–643.

10. Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser L., Polosukhin I. Attention is all you need // Advances in Neural Information Processing Systems (NIPS). – 2017. – P. 5998–6008.

11. Devlin J., Chang M.W., Lee K., Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding // Proceedings of NAACL-HLT. – 2019. – P. 4171–4186.

12. Peters M., Neumann M., Iyyer M., Gardner M., Clark C., Lee K., Zettlemoyer L. Deep contextualized word representations // Proceedings of NAACL-HLT. – 2018. – P. 2227–2237.

13. Kim Y. Convolutional neural networks for sentence classification // Proceedings of EMNLP. – 2014. – P. 1746–1751.

14. Hochreiter S., Schmidhuber J. Long short-term memory // Neural Computation. – 1997. – Vol. 9, No. 8. – P. 1735–1780.

Downloads
Published
2026-03-24
Section
Articles
License
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

COMPARISON OF THE EFFICIENCY OF VARIOUS MACHINE LEARNING ALGORITHMS IN MULTI-CLASS TEXT CLASSIFICATION. (2026). Eureka Journal of Education & Learning Technologies, 2(3), 123-133. https://eurekaoa.com/index.php/2/article/view/643

Most read articles by the same author(s)