A COMPARATIVE STUDY OF DIMENSIONALITY REDUCTION TECHNIQUES FOR HIGH-DIMENSIONAL STATISTICAL DATA

Dhuha Salim Waheed; Zahraa Saad Jasim; Mohammed Guraibawi

A COMPARATIVE STUDY OF DIMENSIONALITY REDUCTION TECHNIQUES FOR HIGH-DIMENSIONAL STATISTICAL DATA

Authors

Dhuha Salim Waheed

AL-Furat AL-AwsatTechnical University, Al-Qadisiyah Polytechnic College, Iraq

Author
Zahraa Saad Jasim

AL-Furat AL-AwsatTechnical University, Al-Qadisiyah Polytechnic College, Iraq

Author
Mohammed Guraibawi

AL-Furat AL-AwsatTechnical University, Al-Qadisiyah Polytechnic College, Iraq

Author

Keywords:

Dimensionality Reduction, PCA, t-SNE, UMAP, High-Dimensional Data, Visualization, Clustering.

Abstract

Dimensionality reduction is: a necessary processing step in order to properly analyze large data sets with many variables; makes it easier to visualize data structures, and reduces computational complexity; reduces the curse of dimensionality. Three popular techniques for reducing dimensionality in high-dimensional datasets were compared with one another for this study. They are: Principal Component Analysis; t-Distributed Stochastic Neighbor Embedding (t-SNE); and Uniform Manifold Approximation and Projection (UMAP). The data used here is derived from the classic Iris dataset augmented by 50 random features obtained through some other means. According to PCA, linear projections can be used while still retaining maximum variance. t-SNE and UMAP give non-linear representations that allow for both local and global structure. Our experiments show that all methods preserve the underlying class structure, while t-SNE and UMAP provide more sharply clustered results. Silhouette analysis confirms the quality of clusters. These results indicate a trade-off between linear and non-linear methods to reduce dimensionality in high-dimensional data.

References

1.Jolliffe, I. T. (2002). Principle Component Analysis, Second Edition, Springer.

2.Citation for van der Maaten, L., & Hinton, G. (2008). Visualizing Data using t-SNE. Machine Learning Research, 9, 2579-2605.

3.McInnes, L., Healy, J., & Melville, J. (2018). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction arXiv:1802.03426.

4.R Core Team (2026). R: A language and environment for statistical computing

5.Shlens, J. (2014). Principal component analysis — A Beginner's guide arXiv:1404.1100.

6.Roweis, S. T., and Saul, L. K. (2000). Local Linear Embedding: A Method for Nonlinear Dimensionality Reduction Science, 290(5500), 2323-2326.

7.Great job training! Hinton, G. E., & Salakhutdinov, R. R. (2006). Neural networks for dimensionality reduction Science, 313(5786), 504-507.

8.Sainburg, T. L., & Grigorescu, D. M. (2021). LUpper to download a PDF file: Drawings in resistance: The half-empty cup is full. Dimensionality reduction and feature selection. In Machine Learning for Biomedical Applications (pp. 97–115). Springer. Springer.

9.McInnes L, Healy J. (2020). UMAP: A visualization and analysis tool for high-dimensional data. arXiv:2009.06603.

10.References Maaten, L. V. D., & Hinton, G. E. Visualizing data using t-SNE. JOURNAL OF MACHINE LEARNING RESEARCH, 9:2579--2605, November 2008

11.van der Maaten, L. (2014). Accelerating t-SNE using GPU. arXiv:1404.3776.

12.Johnson, J., & Zhang, S. (2017, September 22). Dimensionality reduction techniques for high-dimensional data: A review Machine Learning, 106(11), 1877-1889.

Downloads

pdf

Published

2026-02-18

Issue

Vol. 2 No. 2 (2026)

Section

Articles

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

A COMPARATIVE STUDY OF DIMENSIONALITY REDUCTION TECHNIQUES FOR HIGH-DIMENSIONAL STATISTICAL DATA. (2026). Eureka Journal of Artificial Intelligence and Data Innovation, 2(2), 14-22. https://eurekaoa.com/index.php/11/article/view/450

Download Citation

A COMPARATIVE STUDY OF DIMENSIONALITY REDUCTION TECHNIQUES FOR HIGH-DIMENSIONAL STATISTICAL DATA

How to Cite

Similar Articles