Kaggle Competition - Flood Prediction EDA
Rigorous exploratory analysis of a large-scale (>1,000,000 training datapoints) Kaggle flood prediction dataset. It highlights strong skills in handling high-dimensional structured data, performing scalable EDA with efficient visualisations, and applying both statistical and machine learning techniques for insight generation. Key competencies include dimensionality reduction (PCA, UMAP, t-SNE), correlation analysis, feature distribution comparison, and model interpretation using scikit-learn and statsmodels. The project also showcases custom cross-validation tooling, effective use of pipelines for reproducible modelling, and the derivation of a simplified additive model, reflecting a deep understanding of linear structures in high-volume data.