Please use this identifier to cite or link to this item: http://dspace.cas.upm.edu.ph:8080/xmlui/handle/123456789/3128
Title: thrHIVe: Predicting HIV-1 Progression of Patients in ART - A Genomic and Laboratory Data-Driven Approach
Authors: Dorado, Abram C.
Keywords: HIV-1 Progression
Antiretroviral Therapy (ART)
Machine Learning Framework
Genomic Features
K-Mer Extraction
Disease progression
Genomic data
Support Vector Machine (SVM)
Issue Date: Jul-2025
Abstract: The progression of HIV-1 among individuals receiving antiretroviral therapy (ART) remains highly variable, driven by a complex interplay of viral genetic diversity and patient-specific clinical factors. This study presents a machine learning framework that integrates HIV-1 genomic features—specifically 8-mer nucleotide patterns derived from the protease (PR) and reverse transcriptase (RT) regions—with clinical laboratory markers such as CD4 count and viral load to predict disease progression outcomes in ART-treated patients. Using a curated dataset, we implemented a comprehensive preprocessing pipeline involving k-mer extraction, CountVectorizer-based vectorization, SMOTE for addressing class imbalance, standard scaling, and Principal Component Analysis (PCA) for dimensionality reduction. Six machine learning models—Support Vector Machine (SVM), Random Forest, Logistic Regression, XGBoost, K-Nearest Neighbors, and Multi- Layer Perceptron—were systematically evaluated across 216 configurations. After extensive hyperparameter tuning, the SVM model combined with SMOTE and PCA consistently outperformed other models. Notably, it achieved an F1-score of 0.96—selected as the primary evaluation metric due to the dataset’s original imbalance—alongside high scores in accuracy (0.96), precision (0.98), recall (0.95), and AUC-ROC (0.99). These results highlight SVM’s robustness and suitability for high-dimensional genomic classification tasks. To enhance model transparency, LIME was used to identify influential k-mer features contributing to each prediction. These patterns may correspond to biologically meaningful mutations linked to ART resistance or viral fitness. The final, tuned SVM model was deployed in thrHIVe, a web-based application designed to deliver real-time predictions and explainable insights for clinicians and researchers. This study showcase the potential of integrating genomic and clinical data with interpretable machine learning to advance precision HIV care.
URI: http://dspace.cas.upm.edu.ph:8080/xmlui/handle/123456789/3128
Appears in Collections:BS Computer Science SP



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.