Abstract:
Polycystic Ovary Syndrome (PCOS) remains a highly prevalent yet underdiagnosed
endocrine disorder affecting women of reproductive age. This study proposes
PCAUSE, a web-based, machine learning-powered pre-assessment tool designed
to evaluate PCOS risk using only noninvasive clinical and lifestyle features.
Through rigorous preprocessing—including data cleaning, multicollinearity reduction,
and class imbalance correction via BorderlineSMOTE—this research examines
the predictive performance of nine individual classifiers and two ensemble
techniques across four methodological configurations. Mutual Information Feature
Selection (MIFS) was employed to retain the most informative features, and
hyperparameter tuning via GridSearchCV optimized model performance. Among
all configurations, K-Nearest Neighbors (KNN), enhanced with both BorderlineSMOTE
and MIFS, emerged as the most effective classifier, achieving the highest
sensitivity (0.86), crucial for early detection and reducing false negatives. The final
deployed system integrates this best-performing model and incorporates LIME
for local explainability, offering transparent, actionable insights. Positioned as a
clinically supportive and user-friendly screening tool, PCAUSE bridges the diagnostic
gap by empowering women and aiding healthcare providers in early risk
identification—particularly in resource-constrained environments.