thrHIVe: Predicting HIV-1 Progression of Patients in ART - A Genomic and Laboratory Data-Driven Approach

Dorado, Abram C.

DSpace Home
→
Department of Physical Sciences and Mathematics
→
BS Computer Science SP
→
View Item

thrHIVe: Predicting HIV-1 Progression of Patients in ART - A Genomic and Laboratory Data-Driven Approach

Dorado, Abram C.

URI: http://dspace.cas.upm.edu.ph:8080/xmlui/handle/123456789/3128

Date: 2025-07

Abstract:

The progression of HIV-1 among individuals receiving antiretroviral therapy (ART) remains highly variable, driven by a complex interplay of viral genetic diversity and patient-specific clinical factors. This study presents a machine learning framework that integrates HIV-1 genomic features—specifically 8-mer nucleotide patterns derived from the protease (PR) and reverse transcriptase (RT) regions—with clinical laboratory markers such as CD4 count and viral load to predict disease progression outcomes in ART-treated patients. Using a curated dataset, we implemented a comprehensive preprocessing pipeline involving k-mer extraction, CountVectorizer-based vectorization, SMOTE for addressing class imbalance, standard scaling, and Principal Component Analysis (PCA) for dimensionality reduction. Six machine learning models—Support Vector Machine (SVM), Random Forest, Logistic Regression, XGBoost, K-Nearest Neighbors, and Multi- Layer Perceptron—were systematically evaluated across 216 configurations. After extensive hyperparameter tuning, the SVM model combined with SMOTE and PCA consistently outperformed other models. Notably, it achieved an F1-score of 0.96—selected as the primary evaluation metric due to the dataset’s original imbalance—alongside high scores in accuracy (0.96), precision (0.98), recall (0.95), and AUC-ROC (0.99). These results highlight SVM’s robustness and suitability for high-dimensional genomic classification tasks. To enhance model transparency, LIME was used to identify influential k-mer features contributing to each prediction. These patterns may correspond to biologically meaningful mutations linked to ART resistance or viral fitness. The final, tuned SVM model was deployed in thrHIVe, a web-based application designed to deliver real-time predictions and explainable insights for clinicians and researchers. This study showcase the potential of integrating genomic and clinical data with interpretable machine learning to advance precision HIV care.

Show full item record

Files in this item

Name: 2025_Dorado AC_ThrHIVe ...

Size: 2.657Mb

Format: PDF

View/Open

This item appears in the following Collection(s)

BS Computer Science SP
Special Project documents of BS Computer Science students

thrHIVe: Predicting HIV-1 Progression of Patients in ART - A Genomic and Laboratory Data-Driven Approach

thrHIVe: Predicting HIV-1 Progression of Patients in ART - A Genomic and Laboratory Data-Driven Approach

Abstract:

Files in this item

This item appears in the following Collection(s)

Search DSpace

Browse

All of DSpace

This Collection

My Account