Please use this identifier to cite or link to this item:
http://dspace.cas.upm.edu.ph:8080/xmlui/handle/123456789/3128
Title: | thrHIVe: Predicting HIV-1 Progression of Patients in ART - A Genomic and Laboratory Data-Driven Approach |
Authors: | Dorado, Abram C. |
Keywords: | HIV-1 Progression Antiretroviral Therapy (ART) Machine Learning Framework Genomic Features K-Mer Extraction Disease progression Genomic data Support Vector Machine (SVM) |
Issue Date: | Jul-2025 |
Abstract: | The progression of HIV-1 among individuals receiving antiretroviral therapy (ART) remains highly variable, driven by a complex interplay of viral genetic diversity and patient-specific clinical factors. This study presents a machine learning framework that integrates HIV-1 genomic features—specifically 8-mer nucleotide patterns derived from the protease (PR) and reverse transcriptase (RT) regions—with clinical laboratory markers such as CD4 count and viral load to predict disease progression outcomes in ART-treated patients. Using a curated dataset, we implemented a comprehensive preprocessing pipeline involving k-mer extraction, CountVectorizer-based vectorization, SMOTE for addressing class imbalance, standard scaling, and Principal Component Analysis (PCA) for dimensionality reduction. Six machine learning models—Support Vector Machine (SVM), Random Forest, Logistic Regression, XGBoost, K-Nearest Neighbors, and Multi- Layer Perceptron—were systematically evaluated across 216 configurations. After extensive hyperparameter tuning, the SVM model combined with SMOTE and PCA consistently outperformed other models. Notably, it achieved an F1-score of 0.96—selected as the primary evaluation metric due to the dataset’s original imbalance—alongside high scores in accuracy (0.96), precision (0.98), recall (0.95), and AUC-ROC (0.99). These results highlight SVM’s robustness and suitability for high-dimensional genomic classification tasks. To enhance model transparency, LIME was used to identify influential k-mer features contributing to each prediction. These patterns may correspond to biologically meaningful mutations linked to ART resistance or viral fitness. The final, tuned SVM model was deployed in thrHIVe, a web-based application designed to deliver real-time predictions and explainable insights for clinicians and researchers. This study showcase the potential of integrating genomic and clinical data with interpretable machine learning to advance precision HIV care. |
URI: | http://dspace.cas.upm.edu.ph:8080/xmlui/handle/123456789/3128 |
Appears in Collections: | BS Computer Science SP |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
2025_Dorado AC_ThrHIVe Predicting HIV-1 Progression of patients in Art - A Genomic and Laboratory Data-Driven Approach.pdf Until 9999-01-01 | 2.72 MB | Adobe PDF | ![]() View/Open Request a copy |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.