BERT-BASED CNN + BIGRU COMBINED MODEL FOR SENSITIVE ANALYSIS OF UZBEKISTAN TEXTS

Authors
  • Muhamediyeva D. T.

    National Research University “Tashkent Institute of Irrigation and Agricultural Mechanization Engineers”

    Author

  • Mamatov A. A.

    National Research University “Tashkent Institute of Irrigation and Agricultural Mechanization Engineers”

    Author

Keywords:
Uzbek language, sentiment analysis, embedding vectors, Convolutional neural network (CNN), BiGRU, fusion model, natural language processing (NLP), text classification
Abstract

This article studies a modern approach to sentiment (negative, neutral, positive) analysis of Uzbek texts. In the study, text placement vectors are obtained using the BERT model (editor/editor-bert-base) adapted to the Uzbek language and their classification is performed using a Convolutional Neural Network (CNN) and BiGRU. The model is tested on a mini Uzbek sentiment analysis dataset and evaluated using weighted F1-score, confusion matrix, and classification reports. The experimental results show that the combined architecture of BERT-based embedding vectors + CNN + BiGRU provides effective results in classifying Uzbek texts. This approach can be especially useful for resource-limited Uzbek datasets.

References

1.Devlin J., Chang M.W., Lee K., Toutanova K. Bert: pre-training of deep bidirectional transformers for language understanding // Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (ACL). – 2019. – P. 4171–4186.

2.Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser L., Polosukhin I. Attention is all you need // Proceedings of the 31st Annual Conference on Neural Information Processing Systems (NIPS). – 2017. – P. 5998–6008.

3.Peters M., Neumann M., Iyyer M., Gardner M., Clark C., Lee K., Zettlemoyer L. Deep contextualized word representations // Proceedings of NAACL-HLT. – 2018. – P. 2227–2237.

4.Mikolov T., Chen K., Corrado G., Dean J. Efficient estimation of word representations in vector space // Proceedings of ICLR. – 2013. – P. 1–12.

5.Mikolov T., Sutskever I., Chen K., Corrado G., Dean J. Distributed representations of words and phrases and their compositionality // Proceedings of NIPS. – 2013. – P. 3111–3119.

6.Kim Y. Convolutional neural networks for sentence classification // Proceedings of EMNLP. – 2014. – P. 1746–1751.

7.LeCun Y., Bottou L., Bengio Y., Haffner P. Gradient-based learning applied to document recognition // Proceedings of the IEEE. – 1998. – Vol. 86, No. 11. – P. 2278–2324.

8.Hochreiter S., Schmidhuber J. Long short-term memory // Neural Computation. – 1997. – Vol. 9, No. 8. – P. 1735–1780.

9.Cho K., Van Merrienboer B., Gulcehre C., Bahdanau D., Bougares F., Schwenk H., Bengio Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation // Proceedings of EMNLP. – 2014. – P. 1724–1734.

10.Chung J., Gulcehre C., Cho K., Bengio Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. – 2014. – URL: https://arxiv.org/pdf/1412.3555

11.Liu G., Guo J. Bidirectional LSTM with attention mechanism and convolutional layer for text classification // Neurocomputing. – 2019. – Vol. 337. – P. 325–338.

12.Lee J.Y., Dernoncourt F. Sequential short-text classification with recurrent and convolutional neural networks // Proceedings of NAACL-HLT. – 2016. – P. 515–520.

13.Zhang D., Tian L., Hong M., Han F., Ren Y., Chen Y. Combining convolution neural network and bidirectional gated recurrent unit for sentence semantic classification // IEEE Access. – 2018. – Vol. 6. – P. 73750–73759.

14.Qiu X.P., Sun T.X., Xu Y.G., Shao Y.F., Dai N., Huang X.J. Pre-trained models for natural language processing: a survey // Science China Technological Sciences. – 2020. – Vol. 63, No. 10. – P. 1872–1897.

Downloads
Published
2026-03-06
Section
Articles
License
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

BERT-BASED CNN + BIGRU COMBINED MODEL FOR SENSITIVE ANALYSIS OF UZBEKISTAN TEXTS. (2026). Eureka Journal of Artificial Intelligence and Data Innovation, 2(3), 1-10. https://eurekaoa.com/index.php/11/article/view/564

Most read articles by the same author(s)