Abstract:
Machine learning algorithms have been used to predict whether a person could have a heart
attack, stroke or none at all. However, the prediction of whether the condition could be heart
attack or stroke has not yet been done. This study determined that machine learning algorithm
was best used in predicting these outcomes by using online data and if SMOTE could affect
the results. Thus, a program was created that would use the following machine learning
algorithms: K Nearest Neighbors, Logistic Regression with Ridge Regularization, Logistic
Regression with LASSO Regularization, Support Vector Machines with Ridge Regularization,
Support Vector Machines with LASSO Regularization, Random Forest and Gradient Boosting
to see which model is best for predicting heart attack and stroke. It was seen that SMOTE
improved the overall performance of the models. The results for heart attack and stroke when
compared to the combined data set showed similar results. However, it was observed that FBS
has the highest correlation which is different for the models on heart attack and stroke.
Therefore, the best machine learning classifier model based on its accuracy (0.89) and F score
(0.93) was Logistic Regression with Ridge Regularization while in terms of ROC AUC score
(0.67) was SVM with Lasso Regularization.