Machine Learning-Based Prediction Model for Infectious Complications in Trauma and Its Association With In-Hospital Mortality

Background: Infectious complications, such as #sepsis or catheter-related infections, are common and serious sequelae after trauma. Despite their clinical significance, existing risk-prediction models are limited by reliance on in-hospital data that fail to capture complex physiological interactions. Thus, this study aimed to develop and validate an interpretable ensemble machine learning (ML) model integrating both prehospital and in-hospital clinical data to predict infectious complications after trauma.

Methods: We used data from the Korean Trauma Data Bank, comprising patients admitted to all 19 trauma centers from 2017 to 2022 in South Korea (discovery; n = 227,567) and from four additional centers added in 2023 for external validation (n = 8867). Trauma cases were defined utilizing S or T diagnostic codes based on the 7th Korean Standard Classification of Diseases, and infectious complications were defined as a composite outcome of pneumonia, urinary tract infection, catheter-related bloodstream infection, surgical site infection (deep, organ, and superficial), osteomyelitis, or severe sepsis. A total of 33 prehospital and in-hospital features were used in ML model training, and the top-performing models were ensembled to construct the final model. Model performance was evaluated through five-fold cross-validation, internal testing, and external validation. Shapley Additive Explanations (SHAP) were applied to assess predictor importance, and predicted risks were categorized into tertiles (T1-T3) to examine associations with in-hospital mortality and presented adjusted odds ratios (aORs) with 95% confidence intervals (CIs).

Results: Among 88,899 eligible patients with trauma in the discovery cohort, the soft-voting ensemble model integrating logistic regression, categorical boosting, and extreme gradient boosting achieved the best discrimination, with an area under the receiver operating characteristic curve of 0.796 in the discovery cohort and 0.717 in the external validation cohort. SHAP analysis identified age, accident type, Glasgow Coma Scale verbal response, and sex as the most influential variables. Higher tertiles of predicted infection risk were strongly associated with mortality, with aORs of 2.52 (95% CI, 2.12-2.99) for T1, 4.65 (3.96-5.47) for T2, and 6.19 (5.02-7.62) for T3.

Conclusion: This interpretable model, which integrates prehospital and in-hospital data available within the first 24 h of admission, presented robust predictive performance for post-traumatic infectious complications. The proportional association between predicted infection risk and mortality highlights its clinical relevance, as even modest increases in predicted risk may carry meaningful implications for patient outcomes and early intervention strategies.

https://onlinelibrary.wiley.com/doi/10.1002/wjs.70389