Please use this identifier to cite or link to this item:
http://dspace.cas.upm.edu.ph:8080/xmlui/handle/123456789/2682
Title: | SLEvival: Predicting Mortality Risk in Systemic Lupus Erythematosus Patients with Explainable Machine Learning |
Authors: | Baluyut, Ivan R. |
Keywords: | Systemic Lupus Erythematosus (SLE) Lupus Mortality risk Prediction Patient health data Machine learning Asia Pacific Lupus Collaboration (APLC) Multiple Imputation by Chained Equations (MICE) Synthetic Minority Oversampling Technique (SMOTE) Recursive Feature Elimination with Cross-Validation (RFECV) Standard scaler XGBoost Local Interpretable Model-agnostic Explanations (LIME) |
Issue Date: | Jun-2023 |
Abstract: | Systemic Lupus Erythematosus (SLE) is an autoimmune disease with unknown causes and no current cure. While Lupus Low Disease Activity State (LLDAS), an attainable treat-to-target goal in SLE, has been associated with reduced damage accrual and decreased mortality risk, the number of deaths remains significantly high. Among of these deaths have been found to be influenced by demographic and clinical factors such as race, sex, infection, and disease activity. Most studies conducted in SLE were statistical analyses and machine learning approach seems to be very limited on the topic. On the other hand, machine learning have been widely utilized in modern healthcare for various disease prediction studies. Additionally, the Asia Pacific Lupus Collaboration (APLC) cohort provides a dataset that has been commonly included in SLE works. Hence, this study proposes the use of machine learning in creating a prediction system for mortality risk in SLE patients. Label Encoder, Ordinal Encoder, One Hot Encoder, Single Imputation, and Multiple Imputation by Chained Equations (MICE) were applied to create the imputed dataset. Synthetic Minority Oversampling Technique (SMOTE), Recursive Feature Elimination with Cross-Validation (RFECV), and Standard Scaler were further applied to produce 15 more dataset variations. Random Forest, XGBoost, Support Vector Machine, and Logistic Regression were trained on the 16 datasets—developing a total of 64 models. Using AUROC as the main metric, results have shown that the XGBoost configured on the SMOTE dataset was the best performing model with an AUROC of 85.1%. Integrating Local Interpretable Model-agnostic Explanations (LIME) with the best XGBoost, a web application was built that allows a user to input real patient health data and view the mortality risk prediction outcome with explanations firsthand. |
URI: | http://dspace.cas.upm.edu.ph:8080/xmlui/handle/123456789/2682 |
Appears in Collections: | Computer Science SP |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
CD-CS106.pdf | 1.47 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.