Clustering Countries
This project showcases advanced data science and statistical analysis skills through a comprehensive clustering analysis of country-level socio-economic and health data using both hierarchical and k-means clustering methods. Key skills demonstrated include robust data preprocessing, detailed exploratory data analysis with insightful visualisations, Z-score standardisation, PCA for dimensionality reduction, and effective interpretation of cluster structures. The analysis incorporates evaluation metrics such as silhouette and Calinski-Harabasz scores for optimal cluster selection, uses distance metrics (Manhattan and Euclidean), and applies cluster-based inference to identify countries in need of development aid. Additionally, the project integrates model application to new data, creating a prioritised aid strategy using PCA projections and quantitative scoring.