Abstract:
Chronic kidney disease (CKD) is a continuous decrease in kidney function and is a
significant public health concern due to challenges with diagnostics. Artificial
intelligence (AI) and machine learning methods have been applied in the medical field,
particularly in disease prediction which has improved healthcare outcomes for patients
worldwide. This study then aims to evaluate the performance of various machine learning
classifiers for predicting CKD by analyzing patient data from a tertiary hospital in Metro
Manila. Two publicly-available online databases from India and Bangladesh were
combined, resulting in 600 instances of patient data with 14 features. Model training was
then conducted using five different algorithms, namely (1) k-nearest neighbor, (2) logistic
regression (L2 & L1), (3) support vector machine (L2 & L1), (4) random forest, and (5)
gradient boosting methods. Validation was then performed using 200 instances of patient
data from the tertiary hospital. Results of the study show that all trained models were
fairly accurate (>80% accuracy) in predicting the occurrence of CKD in the tertiary
hospital patient data. More specifically, linear SVM (L1) was the most accurate (85.5%),
closely followed by linear SVM (L2) (84.5%). Hemoglobin was also found to be the top
predictor for CKD. In conclusion, machine learning is an effective tool for binary
classification tasks such as the prediction of disease occurrence.