Health Risk Prediction using Wearable Data

Machine learning system for predicting health risks using wearable time-series data with interpretable outputs.

Tech Stack

PythonPandasScikit-LearnXGBoostRandom ForestLIME

Problem Statement

Wearable health data is noisy and high-dimensional, making it difficult to extract meaningful predictors and generate reliable risk predictions.

System Architecture

Built an end-to-end ML pipeline including data preprocessing, feature engineering, model training, evaluation, and interpretability layers.

Approach

Performed extensive EDA and preprocessing on time-series data. Applied SMOTE for class balancing and trained ensemble models including XGBoost and Random Forest. Integrated LIME for model explainability.

Implementation Details

Used Pandas for data cleaning and transformation, Scikit-Learn for model training and validation, and LIME to generate local explanations for predictions.

Challenges & Solutions

Handling missing values in time-series data, balancing classes, and ensuring interpretability without sacrificing performance.

Results & Impact

Achieved 87.4% accuracy with improved model reliability and interpretable outputs highlighting key health indicators.

Key Learnings

Developed strong understanding of feature engineering, model evaluation, and the importance of explainability in real-world ML systems.