COMPARATIVE ANALYSIS OF THE EFFICIENCY OF CLASSIC MACHINE LEARNING MODELS USING DISTILBERT-BASED TEXT VECTORIZATION METHODS

Authors
  • Muhamediyeva D. T.

    National Research University "Tashkent Institute of Irrigation and Agricultural Mechanization Engineers", Namangan State University

    Author

  • Mamatov A. A.

    National Research University "Tashkent Institute of Irrigation and Agricultural Mechanization Engineers", Namangan State University

    Author

Keywords:
Text classification, DistilBERT, transformer model, embedding, machine learning, classification, ensemble methods, artificial intelligence.
Abstract

In this research work, the effectiveness of embeddings generated using the DistilBERT model based on deep learning and classical machine learning algorithms in the process of automatic text classification was comparatively studied. Within the framework of the research, Logistic Regression, Ridge Classifier, Linear SVC, SGD Classifier and Random Forest models were tested on selected categories of the 20 Newsgroups dataset. The texts were converted into contextual vectors using the transformer model and then transferred to classical classification algorithms. The analysis was carried out based on the accuracy level of the models, the F1 index and the training and testing times. The results of the research showed that transformer-based embeddings increase the effectiveness of classical machine learning models.

References

1.Alsmadi I., Gan K.H. Review of short-text classification // International Journal of Web Information Systems. – 2019. – Vol. 15, No. 2. – P. 155–182. 16. Song G., Ye Y., Du X., Huang X., Bie S. Short text classification: a survey // Journal of Multimedia. – 2014. – Vol. 9, No. 5. – P. 635–643.

2.Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser L., Polosukhin I. Attention is all you need // Advances in Neural Information Processing Systems (NIPS). – 2017. – P. 5998–6008.

3.Conneau A., Schwenk H., Barrault L., LeCun Y. Very deep convolutional networks for text classification // Proceedings of EACL. – 2017. – P. 1107 1116.

4.Lee J.Y., Dernoncourt F. Sequential short-text classification with recurrent and convolutional neural networks // Proceedings of NAACL-HLT. – 2016. – P. 515–520.

Downloads
Published
2026-03-30
Section
Articles
License
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

COMPARATIVE ANALYSIS OF THE EFFICIENCY OF CLASSIC MACHINE LEARNING MODELS USING DISTILBERT-BASED TEXT VECTORIZATION METHODS. (2026). Eureka Journal of Computing Science & Digital Innovation, 2(3), 21-31. https://eurekaoa.com/index.php/10/article/view/691

Most read articles by the same author(s)