Research on Diabetes Prediction Based on Multiple machine Learning Methods
DOI:
https://doi.org/10.54097/8sxsyh04Keywords:
Diabetes; Logistic Regression; Decision Tree; Random Forest; Extreme Gradient Boosting.Abstract
Diabetes falls within the category of chronic diseases, and the issue of its prevention and control has always been a health concern for all mankind. This study constructs a prediction model for diabetes based on four machine learning methods: Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), and Extreme Gradient Boosting (XGBoost). In the distribution of important features in LR, the top five important features are selected. On this basis, a model is constructed through other algorithms, aiming to reduce the interference of other irrelevant features, and then compare the performance of models constructed by different machine learning algorithms. The dataset of this study contains 768 samples, covering 8 characteristics including metrics like the number of pregnancies and plasma glucose concentration. Finally, the accuracy rates of the models constructed by the four algorithms of LR, DT, RF, and XGBoost were 75.97%, 78.57%, 74.03% and 74.68% respectively. Combining the area under the receiver operating characteristic curve (AUC), as well as the precision rate, recall rate, and F1 score of the positive category, it is ultimately concluded that the predictive effect of DT is the best. However, to better integrate diabetes prediction models with practical applications, more data and resources are still needed to support them.
Downloads
References
[1] Qu K. Diabetes prediction model based on machine learning [D]. Tianjin University, 2019.
[2] Zhou B., Rayner A. W., Gregg E. W., Sheffer K. E., Carrillo-Larco R. M., Bennett J. E., et al. Worldwide trends in diabetes prevalence and treatment from 1990 to 2022: a pooled analysis of 1108 population-representative studies with 141 million participants. The Lancet, 2024, 404(10467): 2077-2093. DOI: https://doi.org/10.1016/S0140-6736(24)02317-1
[3] Ma W., Wang K., Yu B., et al. The diabetes risk prediction model based on physical examination data contrast research. Journal of Modern Information Technology, 2020, 4(23): 72-75.
[4] Ouyang P., Li X., Leng F., et al. Application of machine learning algorithms in diabetes risk prediction of physical examination population. Chinese Journal of Disease Control and Prevention, 2021, 25(7): 849-853+868.
[5] Zhao X., Ji J., Wang L. Diabetes risk prediction model based on machine learning and empirical research. Journal of Huzhou Normal University, 2022, 44(8): 55-62.
[6] Xiang J. Application of machine learning in predicting diabetes. China Science and Technology Information, 2025(14): 77-80.
[7] Rahman M. H. Diabetes dataset. Kaggle, 2024-11-01. Accessed: 2025-12-08. Available: https://www.kaggle.com/datasets/hasibur013/diabetes-dataset
[8] Zou Q., Zhang Y., Wan Y., et al. Machine learning methods for constructing diabetes-related predictive models. Chinese Journal of Health Statistics, 2023, 40(4): 631-635+640.
[9] Liu Y., Jiang M., Li D., et al. Construction and validation of a prediction model for dysphagia in elderly stroke patients based on interpretable machine learning methods. Chinese Journal of Geriatric Cardiovascular and Cerebrovascular Diseases, 2020, 27(6): 698-704.
[10] Huang Y., Wu X., Yang J. Research and application progress of predictive model for diabetic kidney disease based on machine learning. Chinese Health Standard Management, 2025, 16(9): 194-198.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Highlights in Science, Engineering and Technology

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.







