Causal analysis identifies small HDL particles and physical activity as key determinants of longevity of older adults


The hard endpoint of death is one of the most significant outcomes in both clinical practice and research settings. Our goal was to discover direct causes of longevity from medically accessible data.


Using a framework that combines local causal discovery algorithms with discovery of maximally predictive and compact feature sets (the “Markov boundaries” of the response) and equivalence classes, we examined 186 variables and their relationships with survival over 27 years in 1507 participants, aged ≥71 years, of the longitudinal, community-based D-EPESE study.


As few as 8-15 variables predicted longevity at 2-, 5- and 10-years with predictive performance (area under receiver operator characteristic curve) of 0·76 (95% CIs 0·69, 0·83), 0·76 (0·72, 0·81) and 0·66 (0·61, 0·71), respectively. Numbers of small high-density lipoprotein particles, younger age, and fewer pack years of cigarette smoking were the strongest determinants of longevity at 2-, 5- and 10-years, respectively. Physical function was a prominent predictor of longevity at all time horizons. Age and cognitive function contributed to predictions at 5 and 10 years. Age was not among the local 2-year prediction variables (although significant in univariable analysis), thus establishing that age is not a direct cause of 2-year longevity in the context of measured factors in our data that determine longevity.


The discoveries in this study proceed from causal data science analyses of deep clinical and molecular phenotyping data in a community-based cohort of older adults with known lifespan.