Background: Sudden cardiac death (SCD) is a major contributor to cardiovascular mortality, but reliable long-term risk prediction in community-based populations remains limited. Machine learning (#ML ) offers potential advantages, yet its application to #SCD prediction remains comparatively limited, particularly in community-based populations.
Methods: We used data from the Chin-Shan Community Cardiovascular Cohort (CCCC), a prospective community-based cohort in Taiwan enrolling adults aged ≥ 35 years from 14 villages. Participants were geographically partitioned into training/internal validation and independent external validation cohorts. After feature preselection using the Boruta algorithm, six ML models were developed (Light Gradient Boosting Machine, Random Forest [RF], Logistic Regression, Support Vector Machine, Multilayer Perceptron, and K-Nearest Neighbours), with class imbalance addressed using appropriate techniques. Model performance was evaluated using discrimination and classification metrics, including the area under the curve (AUC), positive predictive value, and negative predictive value (NPV). The optimal model was interpreted using SHapley Additive exPlanations and implemented as a web-based risk calculator.
Results: A total of 3,172 participants were included (median age [IQR]: 54.9 [45.8-64.0] years; 47.4% male), with 74 SCD events observed over a median follow-up of 15.9 years (IQR, 13.1-16.9 years). Ten non-collinear predictors were preselected, and six ML models were developed and validated. The RF showed the highest discrimination in internal (AUC: 0.824) and external validation (AUC: 0.815) and was the only model to significantly outperform the Framingham Risk Score. The RF demonstrated consistently high NPVs (> 99%) in both internal and external validation cohorts and was implemented as a web-based risk calculator.
Conclusions: We developed and externally validated a RF model for long-term SCD risk prediction with high NPV, supporting its potential utility for identifying low-risk individuals in community settings, pending further validation. The model has been implemented as an online risk calculator, with further validation in larger and diverse populations warranted.
https://link.springer.com/article/10.1186/s12944-026-02984-5