Predicting Olympic Medal Counts by Bootstrap-Enhanced Gradient Boosting Machine with Log-Transformed K-Means Clustering
DOI:
https://doi.org/10.54097/mqgvt685Keywords:
Gradient Boosting Machine, Bootstrap Sampling Algorithm, Log-transformed k-means Clustering, Prediction Model.Abstract
Olympic medal counts serve as a critical indicator of a nation's comprehensive sports strength. Accurate predictions offer distinctive perspectives for strategic resource allocation. To handle the strongly skewed distribution featuring long tails in medal data and capture complex nonlinear relationships among variables, this study proposes a hybrid methodology combining log-transformed K-means clustering and Bootstrap-Enhanced Gradient Boosting Machine (BEGBM). Countries are stratified into five strength tiers (e.g., elite to bottom) via K-means clustering on log-transformed medal features, reducing skewness bias. A robust BEGBM model trained on bootstrap samples (B=150) incorporates fluid sports variables and clustered tiers, with missing values imputed by random forests. The work presented here is the first to integrate tiered and mindful of uncertainty GBM into medal prediction. The model achieves RMSE=1.16 (gold)/1.91 (total), MAE=0.33/0.81, and =0.50/0.61. It further provides 95% CIs for the 2028 Los Angeles Olympics (e.g., USA: [35.8, 42.7] gold medals) and identifies nations with significantly changing medal potential. This data-based method offers sports authorities with a dependable tool for optimizing resource investment.
Downloads
References
[1] Millet G P, Hosokawa Y, Sandbakk Ø, et al. Editorial: Tokyo 2020 Olympic and Paralympic games: Specificities, novelties and lessons learned. [J]. Frontiers in sports and active living, 2022, 4: 1026769. DOI: https://doi.org/10.3389/fspor.2022.1026769
[2] Tchamkerten A, Chaudron P, Girard N, et al. Career factors related to winning Olympic medals in swimming[J]. PLoS One, 2024, 19(6): e0304444. DOI: https://doi.org/10.1371/journal.pone.0304444
[3] Csurilla G, Ferto I. How long does a medal win last? Survival analysis of the duration of Olympic success[J]. Applied Economics, 2022, 54(43): 5006-5020. DOI: https://doi.org/10.1080/00036846.2022.2039370
[4] Sekitani K, Zhao Y. Performance benchmarking of achievements in the Olympics: An application of Data Envelopment Analysis with restricted multipliers[J]. European Journal of Operational Research, 2021, 294(3): 1202-1212. DOI: https://doi.org/10.1016/j.ejor.2021.02.040
[5] Tripepi G, Jager K J, Dekker F W, et al Linear and logistic regression analysis[J]. Kidney International, 2008, 73(7): 806-810. DOI: https://doi.org/10.1038/sj.ki.5002787
[6] Bai Y P, Chen Q Q. Spatio-temporal evolution and influencing factors of technological innovation efficiency in the software and information technology service industry of the Yangtze River Economic Belt[J]. Journal of Nanjing University of Posts and Telecommunications (Social Science Edition), 2023, 25(2): 56-67.
[7] Ikotun A M, Ezugwu A E, Abualigah L, et al. K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data[J]. Information Sciences, 2023, 622: 178-210. DOI: https://doi.org/10.1016/j.ins.2022.11.139
[8] Jiang F, Yu X, Du J, et al. Ensemble learning based on approximate reducts and bootstrap sampling[J]. Information Sciences, 2021, 547: 797-813. DOI: https://doi.org/10.1016/j.ins.2020.08.069
[9] Pantanowitz A, Marwala T. Missing Data Imputation Through the Use of the Random Forest Algorithm[C]//Yu W, Sanchez E N. Advances in Computational Intelligence. Berlin, Heidelberg: Springer, 2009: 53-62. DOI: https://doi.org/10.1007/978-3-642-03156-4_6
[10] Sholeh M, Aeni K. Perbandingan Evaluasi Metode Davies Bouldin, Elbow dan Silhouette pada Model Clustering dengan Menggunakan Algoritma K-Means [Performance Comparison of Davies-Bouldin, Elbow, and Silhouette Validation Techniques in K-means Clustering Models] [J]. STRING (Satuan Tulisan Riset dan Inovasi Teknologi), 2023, 8(1): 56-65. DOI: https://doi.org/10.30998/string.v8i1.16388
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Highlights in Science, Engineering and Technology

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.







