Background: Risk prediction in dilated cardiomyopathy ( #DCM ) remains suboptimal, and there is uncertainty about how newer machine-learning (ML) methods compare with conventional regression for clinically useful prognostic modelling. Advanced three-dimensional (3D) echocardiographic measures, particularly of right ventricular function, may improve model performance when combined with routinely collected clinical data. We aimed to compare conventional Cox regression, penalised Cox regression, and ML approaches for prognostic modelling in DCM and to identify models that offer the best balance of discrimination, calibration, and interpretability for risk stratification.
Methods: We conducted a retrospective cohort study including 196 adults with DCM attending a tertiary cardiology centre between 2021 and 2023. Participants were followed for a composite outcome of all-cause mortality, heart failure rehospitalisation, or left ventricular assist device (LVAD) implantation. We considered 41 candidate predictors, including demographic and clinical variables and 3D echocardiographic parameters (e.g. 4D right ventricular ejection fraction [4D-RVEF], tricuspid annular plane systolic excursion [TAPSE], right ventricular global longitudinal strain [RVGLS], left atrial volume index [LAVI], and pulmonary artery systolic pressure [PASP]). Twelve prognostic models were developed including conventional Cox regression, penalised Cox regression (Lasso-Cox), and several ML models-and evaluated using internal and performance assessment at different prediction horizons (up to 24 months). Performance was assessed using area under the receiver operating characteristic curve (AUC), calibration plots, and SHAP-based feature importance.
Results: At 12 months, he best-performing ML model achieved the highest discrimination (AUC 0.990),followed by GBDT and Lasso-Cox (AUC 0.825). Model discrimination attenuated at longer prediction horizons, with the Lasso-Cox model maintaining acceptable performance at 24 months (AUC 0.729). Although RF and GBDT demonstrated excellent discrimination, calibration analyses revealed systematic under- and over-prediction at the extremes of risk. By contrast, Lasso-Cox showed more stable and favourable calibration across risk deciles. Across models, key predictors consistently included 4D-RVEF, LAVI, PASP, and TAPSE.
Conclusions: In this DCM cohort, ML models, particularly RF, maximised discrimination but exhibited calibration issues. A penalised regression model (Lasso-Cox) provided the best overall trade-off between discrimination, calibration, and interpretability, and is therefore recommended as the preferred approach for clinical risk stratification and future public health-oriented implementation studies in DCM.