Please use this identifier to cite or link to this item: http://dspace.cas.upm.edu.ph:8080/xmlui/handle/123456789/3128
Full metadata record
DC FieldValueLanguage
dc.contributor.authorDorado, Abram C.-
dc.date.accessioned2025-08-15T01:08:39Z-
dc.date.available2025-08-15T01:08:39Z-
dc.date.issued2025-07-
dc.identifier.urihttp://dspace.cas.upm.edu.ph:8080/xmlui/handle/123456789/3128-
dc.description.abstractThe progression of HIV-1 among individuals receiving antiretroviral therapy (ART) remains highly variable, driven by a complex interplay of viral genetic diversity and patient-specific clinical factors. This study presents a machine learning framework that integrates HIV-1 genomic features—specifically 8-mer nucleotide patterns derived from the protease (PR) and reverse transcriptase (RT) regions—with clinical laboratory markers such as CD4 count and viral load to predict disease progression outcomes in ART-treated patients. Using a curated dataset, we implemented a comprehensive preprocessing pipeline involving k-mer extraction, CountVectorizer-based vectorization, SMOTE for addressing class imbalance, standard scaling, and Principal Component Analysis (PCA) for dimensionality reduction. Six machine learning models—Support Vector Machine (SVM), Random Forest, Logistic Regression, XGBoost, K-Nearest Neighbors, and Multi- Layer Perceptron—were systematically evaluated across 216 configurations. After extensive hyperparameter tuning, the SVM model combined with SMOTE and PCA consistently outperformed other models. Notably, it achieved an F1-score of 0.96—selected as the primary evaluation metric due to the dataset’s original imbalance—alongside high scores in accuracy (0.96), precision (0.98), recall (0.95), and AUC-ROC (0.99). These results highlight SVM’s robustness and suitability for high-dimensional genomic classification tasks. To enhance model transparency, LIME was used to identify influential k-mer features contributing to each prediction. These patterns may correspond to biologically meaningful mutations linked to ART resistance or viral fitness. The final, tuned SVM model was deployed in thrHIVe, a web-based application designed to deliver real-time predictions and explainable insights for clinicians and researchers. This study showcase the potential of integrating genomic and clinical data with interpretable machine learning to advance precision HIV care.en_US
dc.subjectHIV-1 Progressionen_US
dc.subjectAntiretroviral Therapy (ART)en_US
dc.subjectMachine Learning Frameworken_US
dc.subjectGenomic Featuresen_US
dc.subjectK-Mer Extractionen_US
dc.subjectDisease progressionen_US
dc.subjectGenomic dataen_US
dc.subjectSupport Vector Machine (SVM)en_US
dc.titlethrHIVe: Predicting HIV-1 Progression of Patients in ART - A Genomic and Laboratory Data-Driven Approachen_US
dc.typeThesisen_US
Appears in Collections:BS Computer Science SP



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.