Artificial Intelligence–Enabled Prediction of Heart Failure Risk From Single-Lead Electrocardiograms
Findings In this multinational cohort study examining a noise-adapted artificial intelligence (AI) algorithm for single-lead ECGs, a positive AI-ECG screening result was associated with 3- to 7-fold higher HF risk, independent of age, sex, and comorbidities in the US, the UK, and Brazil. The AI model demonstrated similar or improved performance compared with 2 established clinical risk scores for HF prediction.
Meaning A noise-adapted AI model for single-lead ECG predicted new-onset HF risk and may provide a potentially scalable HF risk-stratification strategy.
Importance Despite the availability of disease-modifying therapies, scalable strategies for heart failure (HF) risk stratification remain elusive. Portable devices capable of recording single-lead electrocardiograms (ECGs) may enable large-scale community-based risk assessment.
Objective To evaluate whether an artificial intelligence (AI) algorithm can predict HF risk from noisy single-lead ECGs.
Design, Setting, and Participants A retrospective cohort study of individuals without HF at baseline was conducted among individuals with conventionally obtained outpatient ECGs in the integrated Yale New Haven Health System (YNHHS) and prospective population-based cohorts of the UK Biobank (UKB) and the Brazilian Longitudinal Study of Adult Health (ELSA-Brasil). Data analysis was performed from September 2023 to February 2025.
Exposure AI-ECG–defined risk of left ventricular systolic dysfunction (LVSD).
Main Outcomes and Measures Among individuals with ECGs, lead I ECGs were isolated and a noise-adapted AI-ECG model (to simulate ECG signals from wearable devices) trained to identify LVSD was deployed. The association of the model probability with new-onset HF, defined as the first HF hospitalization, was evaluated. The discrimination of AI-ECG was compared against 2 risk scores for new-onset HF (Pooled Cohort Equations to Prevent Heart Failure [PCP-HF] and Predicting Risk of Cardiovascular Disease Events [PREVENT] equations) using the Harrel C statistic, integrated discrimination improvement, and net reclassification improvement.
Results There were 192 667 YNHHS patients (median [IQR] age, 56 [41-69] years; 111 181 women [57.7%]), 42 141 UKB participants (median [IQR] age, 65 [59-71] years; 21 795 women [51.7%]), and 13 454 ELSA-Brasil participants (median [IQR] age, 51 [45-58] years; 7348 women [54.6%]) with baseline ECGs. A total of 3697 (1.9%) developed HF in YNHHS over a median (IQR) of 4.6 (2.8-6.6) years, 46 (0.1%) in UKB over a median (IQR) of 3.1 (2.1-4.5) years, and 31 (0.2%) in ELSA-Brasil over a median (IQR) of 4.2 (3.7-4.5) years. A positive AI-ECG screening result for LVSD was associated with a 3- to 7-fold higher risk for HF, and each 0.1 increment in the model probability was associated with a 27% to 65% higher hazard across cohorts, independent of age, sex, comorbidities, and competing risk of death. AI-ECG’s discrimination for new-onset HF was 0.723 (95% CI, 0.694-0.752) in YNHHS, 0.736 (95% CI, 0.606-0.867) in UKB, and 0.828 (95% CI, 0.692-0.964) in ELSA-Brasil. Across cohorts, incorporating AI-ECG predictions alongside PCP-HF and PREVENT equations was associated with a higher Harrel C statistic (difference in addition to PCP-HF, 0.080-0.107; difference in addition to PREVENT, 0.069-0.094). AI-ECG had an integrated discrimination improvement of 0.091 to 0.205 vs PCP-HF and 0.068 to 0.192 vs PREVENT; it had a net reclassification improvement of 18.2% to 47.2% vs PCP-HF and 11.8% to 47.5% vs PREVENT.
Conclusions and Relevance Across multinational cohorts, a noise-adapted AI-ECG model estimated HF risk using lead I ECGs, suggesting a potential HF risk-stratification strategy requiring prospective study using wearable and portable ECG devices.
Accessible strategies for heart failure (HF) risk stratification remain elusive despite the availability of evidence-based therapies that can effectively modify the disease trajectory.1,2 Clinical scores to predict HF risk, such as the Pooled Cohort Equations to Prevent Heart Failure (PCP-HF), the Predicting Risk of Cardiovascular Disease Events (PREVENT) equations, and the Health ABC score,3–5 require clinical evaluation, including a detailed history, physical examination, electrocardiogram (ECG), and laboratory testing.3–9 These inputs limit the use of the equations, systematically excluding those without health care access.8–10 Similarly, serum-based biomarkers, such as N-terminal pro–B-type natriuretic peptide (NT-proBNP) and high-sensitivity cardiac troponin, which are associated with a higher HF risk when elevated, are limited by the need for phlebotomy and sample storage and by frequent inaccessibility at the point of contact.11–16 Thus, there is an unmet need for a simple and efficient strategy for HF risk stratification in the community.
Given their increasing utility and ubiquity, portable devices capable of recording single-lead ECGs have been proposed as a platform for cardiovascular monitoring and screening.17–20 Further, artificial intelligence (AI)–enhanced interpretation of ECGs (AI-ECG) has been shown to detect hidden cardiovascular disease signatures from single-lead ECGs.21–26 However, these portable ECGs are prone to noise introduction during acquisition, which can limit the AI model performance unless specialized measures are taken to ensure they are resilient to such noise.21,27 Recently, we reported a novel approach for single-lead ECGs that incorporates the introduction of random noise during model development, enabling consistent diagnostic performance across varying levels of real-world noises.21 Our initial model development focused on detecting reduced left ventricular (LV) ejection fraction (LVEF) on single-lead ECG based on information from a concurrent echocardiogram, with the potential application of identifying subclinical LV systolic dysfunction (LVSD). Recent studies also suggest that the AI-ECG signature identifies other markers of LV dysfunction, including abnormal LV strain and diastolic function, especially among those with a positive AI-ECG screening result but preserved LVEF.28–30
Given the increasing accessibility of single-lead ECGs, we hypothesized that an AI model developed to detect the cross-sectional signature of LVSD from single-lead ECGs can predict future HF risk. We evaluated our approach in individuals undergoing conventional outpatient ECGs within a diverse US health system and 2 large population-based cohorts in the UK and Brazil.
We included 3 large cohorts spanning different countries and settings who had undergone a conventional ECG: (1) individuals seeking outpatient care in the Yale New Haven Health System (YNHHS), a large health care system in the northeastern US, including 5 independent hospitals and an outpatient network, (2) participants in the UK Biobank (UKB), a nationwide UK-based cohort study, and (3) participants in the Brazilian Longitudinal Study of Adult Health (ELSA-Brasil), the largest community-based cohort study from Brazil (Figure). While the YNHHS cohort included testing and follow-up as a part of routine clinical care in an integrated health system, participants in the UKB and ELSA-Brasil cohorts had protocolized evaluation at baseline and comprehensive longitudinal follow-up (eMethods in Supplement 1).
Race and ethnicity were self-reported and categorized as Asian, Black, Brazilian Pardo, Hispanic, White, and other (encompassing Native American, Pacific Islander, and multiracial). AI-ECG indicates artificial intelligence–enhanced interpretation of electrocardiogram; BMI, body mass index; CNN, convolutional neural network; ECG, electrocardiogram; EF, ejection fraction; EHR, electronic health record; ELSA-Brasil, Brazilian Longitudinal Study of Adult Health; GFR, glomerular filtration rate; HDL, high-density lipoprotein; HF, heart failure; HTN, hypertension; LV, left ventricular; LVDD, LV diastolic dysfunction; NT-proBNP, N-terminal pro–B-type natriuretic peptide; SBP, systolic blood pressure; T2D, type 2 diabetes; TTE, transthoracic echocardiogram; YNHH, Yale New Haven Hospital; YNHHS, Yale New Haven Health System. To convert NT-proBNP to nanograms per liter, multiply by 1.
Our study follows the TRIPOD + AI reporting guidelines.31 Owing to the use of deidentified data, the Yale institutional review board deemed the study exempt from review and did not require the patients to provide informed consent. Data analysis was performed from September 2023 to February 2025.
In YNHHS, to approximate a screening setting, we identified patients undergoing an outpatient 12-lead ECG in 2014 through 2023 without HF before the ECG. To account for the ECGs potentially being obtained as a part of HF workup, we included a 1-year blanking period from the first recorded encounter in the electronic health record (EHR) to identify those with prevalent HF (eMethods and eFigure 1 in Supplement 1). A total of 255 604 individuals had at least 1 outpatient ECG after the blanking period. We excluded 47 720 patients who were included in the model development population and 11 954 with prevalent HF. Additionally, we excluded 1590 patients with LV dysfunction (LVEF <50% or moderate to severe LV diastolic dysfunction per echocardiography laboratory report) and 1673 patients with an NT-proBNP level greater than 300 pg/mL (to convert to nanograms per liter, multiply by 1) before the index ECG (eFigure 2 in Supplement 1).
To avoid selection bias in UKB and ELSA-Brasil, we identified all participants who received a 12-lead ECG. In UKB, 42 366 participants underwent a 12-lead ECG during imaging visits (2014-2020). We used the linkage with the UK National Health Service EHR to exclude 225 participants who had been hospitalized with a principal or secondary discharge diagnosis of HF before the ECG. In ELSA-Brasil, we included 13 739 participants who had undergone a 12-lead ECG in 2008 through 2010, excluding those with HF (n = 227) or an LVEF less than 50% (n = 58) on their baseline echocardiogram (eFigure 2 in Supplement 1).
We defined the outcome as new-onset HF, characterized by HF hospitalizations. In YNHHS, this was defined as a hospitalization with an International Statistical Classification of Diseases, Tenth Revision, Clinical Modification code for HF as the principal discharge diagnosis (eTable 1 in Supplement 1). This approach was guided by the diagnosis codes’ specificity of greater than 95%, especially as the principal discharge diagnosis, for HF diagnosis.32 Similarly, in UKB, we used the linked EHR to identify hospitalizations with HF as the principal diagnosis code. In ELSA-Brasil, HF was identified by in-person interview or telephonic surveillance for all hospitalizations, followed by independent medical record review and adjudication of HF hospitalizations by 2 cardiologists (eMethods in Supplement 1).33
We further evaluated the association of AI-ECG probabilities with alternate definitions of HF and with composite outcomes, including (1) any hospitalization with a principal or secondary HF diagnosis code, (2) a subsequent echocardiogram with LVEF less than 50%, and (3) a composite outcome of HF or all-cause death (eMethods in Supplement 1). To evaluate the specificity of AI-ECG–defined HF risk, we examined the risk of other cardiovascular conditions, including acute myocardial infarction, stroke hospitalizations, and all-cause mortality (eTable 1 in Supplement 1). A composite outcome of major adverse cardiovascular events (MACE) was defined as HF, acute myocardial infarction, stroke, or death.
We defined the exposure as the output of an AI-ECG model trained to detect concurrent LVSD on lead I of a 12-lead ECG, representing the lead commonly captured by portable ECG devices.21 This was developed at the Yale New Haven Hospital using a novel approach of augmenting training data with random gaussian noise (eMethods in Supplement 1). The model achieved excellent discrimination (area under the receiver operating characteristic curve of 0.899 [95% CI, 0.889-0.909]) for detecting concurrent LVSD in the Yale New Haven Hospital held-out test set and performed consistently across clinical and population-based external validation cohorts (eTable 2 and eFigure 3 in Supplement 1).
We deployed this established model without further development to lead I ECG signals to obtain the LVSD probability, representing a continuous HF risk score. We defined a positive AI-ECG screening result as an output probability greater than 0.08, representing the model threshold for 90% sensitivity for detecting LVSD during internal validation.21
We compared the performance of the AI-ECG algorithm with 2 established risk scores for predicting HF risk, the PCP-HF and PREVENT equations. The PCP-HF score was developed and validated in 7 community-based cohorts.4 It uses 12 input features, including demographic characteristics (age, sex, and race and ethnicity [self-reported; categorized as Asian, Black, Brazilian Pardo, Hispanic, White, and other, encompassing Native American, Pacific Islander, and multiracial]), physical examination–based features (smoking status, body mass index [BMI; calculated as weight in kilograms divided by height in meters squared], systolic blood pressure), laboratory measurements (total cholesterol, high-density lipoprotein cholesterol, fasting blood glucose), medication history (use of antihypertensive or antihyperglycemic medications), and electrocardiographically defined QRS duration. The PREVENT equations were recently developed using data from more than 3.2 million individuals and were validated in 21 datasets.5 ,34 The PREVENT equations for HF risk prediction use 8 inputs, entailing demographic characteristics (age, sex), medical history (type 2 diabetes), physical examination–based features (smoking status, BMI, systolic blood pressure), laboratory measurements (estimated glomerular filtration rate), and medication history (antihypertensive medication use). Across cohorts, these features were determined using the EHR and/or study visits (eMethods in Supplement 1).35–38
We used age-, sex-, and comorbidity-adjusted Cox proportional hazards models with time to first HF event as the dependent variable and the AI-ECG–based screening results (positive or negative) or continuous model probability as the independent variable to evaluate the association of the model output with HF risk. Multioutcome Fine-Gray subdistribution hazards models were used to account for the competing risk of death.39
The incremental discrimination of AI-ECG over the PCP-HF and PREVENT equations for predicting time to HF hospitalization was reported as the difference in Harrel C statistics using a 1-shot nonparametric approach.40 We calculated integrated discrimination improvement (IDI) and categorical and continuous time to event net reclassification improvement (NRI).41 We further compared the net benefit of the AI-ECG model with the PCP-HF and PREVENT equations across probability thresholds (eMethods in Supplement 1).42 The code for statistical analyses is publicly available at https://github.com/CarDS-Yale/AI-ECG-HF-Pred.43
All analyses were conducted using a combination of Python version 3.11.2 (Python Software Foundation) and R version 4.2.0 (R Foundation) software. All statistical tests were 2-sided with a level of significance set at .05.
From YNHHS, we included 192 667 individuals with a median age of 56 years (IQR, 41-69 years), comprising 111 181 women (57.7%); 33 256 individuals (17.3%) were Hispanic, 30 623 (15.9%) were non-Hispanic Black, and 117 857 (61.2%) were non-Hispanic White. Over a median 4.6-year follow-up (IQR, 2.8-6.6 years), 3697 individuals (1.9%) had an HF hospitalization, 7514 (3.9%) had an HF hospitalization or an LVEF less than 50% on a subsequent echocardiogram, and 10 381 (5.4%) died (Table 1; eTable 3 in Supplement 1).
| Characteristic | YNHHS (n = 192 667) | UKB (n = 42 141) | ELSA-Brasil (n = 13 454) |
|---|---|---|---|
| Age at ECG, median (IQR), y | 56 (41-69) | 65 (59-71) | 51 (45-58) |
| Sex, No. (%) | |||
| Female | 111 181 (57.7) | 21 795 (51.7) | 7348 (54.6) |
| Male | 81 486 (42.3) | 20 346 (48.3) | 6106 (45.4) |
| Race and ethnicity, No. (%)a | |||
| Asian | 3553 (1.8) | 600 (1.4) | 0 |
| Black | 30 623 (15.9) | 304 (0.7) | 2130 (15.8) |
| Brazilian Pardo | 0 | 0 | 3767 (28.0) |
| Hispanic | 33 256 (17.3) | 0 | 0 |
| White | 117 857 (61.2) | 40 691 (96.6) | 6920 (51.4) |
| Otherb | 2159 (1.1) | 546 (1.3) | 637 (4.7) |
| Missing | 5219 (2.7) | 0 | 0 |
| Deaths, No. (%) | 10 381 (5.4) | 346 (0.8) | 229 (1.7) |
| Follow-up time, median (IQR), y | 4.6 (2.8-6.6) | 3.1 (2.1-4.5) | 4.2 (3.7-4.5) |
| Positive screening results, No. (%) | 42 775 (22.2) | 5513 (13.1) | 1928 (14.3) |
| At baseline, No. (%) | |||
| Hypertension | 88 215 (45.8) | 6126 (14.5) | 4739 (35.3) |
| Type 2 diabetes | 35 522 (18.4) | 1258 (3.0) | 2105 (15.6) |
| Obesity | 30 493 (15.8) | 7535 (17.9) | 3045 (22.6) |
| Atrial fibrillation | 4746 (2.5) | 637 (1.5) | NAc |
| Left bundle branch block | 2397 (1.2) | 383 (0.9) | NAc |
| Use of antihypertensive drugs | 47 611 (24.7) | 9936 (23.9) | 3640 (27.1) |
| Use of antihyperglycemic drugs | 30 520 (15.8) | 321 (0.8) | 1072 (8.0) |
| End-stage kidney disease | 547 (0.3) | 0 | 10 (0.1) |
| During follow-up, No. (%) | |||
| Primary HF hospitalization | 3697 (1.9) | 46 (0.1) | 31 (0.2) |
| Primary HF hospitalization or echocardiogram with LVEF <50% | 7514 (3.9) | NAd | NAd |
| Any HF hospitalization | 13 705 (7.1) | 231 (0.5) | NAd |
| Any HF hospitalization or echocardiogram with LVEF <50% | 15 705 (8.2) | NAd | NAd |
| Primary AMI hospitalization | 366 (0.2) | 208 (0.5) | 60 (0.4) |
| Primary stroke hospitalization | 3281 (1.7) | 210 (0.5) | 59 (0.4) |
| Major adverse cardiovascular events | 16 039 (8.3) | 768 (1.8) | 338 (2.5) |
The 42 141 UKB participants had a median age of 65 years (IQR, 59-71 years), including 21 795 women (51.7%); 304 individuals (0.7%) identified as Black and 40 691 (96.6%) as White. Over a median follow-up of 3.1 years (IQR, 2.1-4.5 years), 46 individuals (0.1%) had an HF hospitalization and 346 (0.8%) died (Table 1).
From ELSA-Brasil, the 13 454 participants had a median age of 51 years (IQR, 45-58 years), comprising 7348 women (54.6%); 2130 participants (15.8%) identified as Black, 3767 (28.0%) as Pardo, and 6920 (51.4%) as White. Over a median of 4.2 years (IQR, 3.7-4.5 years), 31 individuals (0.2%) developed HF and 229 (1.7%) died.
In YNHHS, 42 775 patients (22.2%) screened positive on the AI model applied to the baseline single-lead ECG. A positive screening result was associated with more than a 5-fold higher risk of developing HF (hazard ratio [HR], 5.05 [95% CI, 4.73-5.39]) (Table 2). After accounting for differences in age and sex, a positive AI-ECG screening result was associated with a 3.3-fold higher risk of HF compared with a negative screening result (adjusted HR [aHR], 3.31 [95% CI, 3.10-3.54]). The association remained statistically significant after accounting for differences in HF risk factors, including prior ischemic heart disease, hypertension, type 2 diabetes, and obesity (aHR, 2.81 [95% CI, 2.63-3.01]) and after additionally accounting for the competing risk of death (aHR, 2.73 [95% CI, 2.55-2.93]). The association of a positive screening result with an elevated HF risk was noted across YNHHS sites (eTable 4 in Supplement 1), demographic subgroups (eTable 5 in Supplement 1), and different HF definitions (eTables 6 and 7 in Supplement 1).
| Predictive model inputs | Hazard ratio (95% CI) | |||||
|---|---|---|---|---|---|---|
| YNHHS | UKB | ELSA-Brasil | ||||
| Positive screen | Per 0.1 increment | Positive screen | Per 0.1 increment | Positive screen | Per 0.1 increment | |
| Cox proportional hazards model | ||||||
| AI-ECG probability | 5.05 (4.73-5.39) | 1.45 (1.44-1.47) | 7.52 (4.21-13.41) | 1.55 (1.40-1.71) | 11.11 (5.32-23.19) | 1.83 (1.64-2.05) |
| AI-ECG probability, age, sex | 3.31 (3.10-3.54) | 1.32 (1.30-1.34) | 5.96 (3.32-10.68) | 1.52 (1.37-1.68) | 8.74 (4.13-18.48) | 1.75 (1.56-1.97) |
| AI-ECG probability, age, sex, IHD, HTN, T2D, obesity | 2.81 (2.63-3.01) | 1.28 (1.26-1.30) | 5.02 (2.77-9.09) | 1.49 (1.33-1.66) | 7.71 (3.62-16.46) | 1.72 (1.52-1.93) |
| Fine-Gray subdistribution hazards model | ||||||
| AI-ECG probability, age, sex, accounting for competing risk of death | 3.22 (3.01-3.45) | 1.30 (1.29-1.32) | 5.91 (3.33-10.50) | 1.51 (1.38-1.66) | 8.67 (4.02-18.70) | 1.74 (1.55-1.96) |
| AI-ECG, probability, age, sex, IHD, HTN, T2D, obesity, accounting for competing risk of death | 2.73 (2.55-2.93) | 1.27 (1.25-1.28) | 4.99 (2.81-8.87) | 1.49 (1.36-1.63) | 6.53 (2.91-14.67) | 1.65 (1.46-1.87) |
In UKB, 5513 participants (13.1%) screened positive with the AI-ECG model. A positive AI-ECG screening result was associated with a 7.5-fold higher risk of developing HF (HR, 7.52 [95% CI, 4.21-13.41]). After accounting for age, sex, HF risk factors, and the competing risk of death, screen-positive participants had a 5-fold higher risk of HF (aHR, 5.02 [95% CI, 2.77-9.09]) (Table 2).
In the ELSA-Brasil cohort, 1928 participants (14.3%) had a positive AI-ECG screen, with a 9-fold higher HF risk (age- and sex-adjusted HR, 8.74 [95% CI, 4.13-18.48]) compared with screen-negative participants. This association was consistent even after accounting for comorbidities and the competing risk of death (aHR, 7.71 [95% CI, 3.62-16.46]).
Across the YNHHS network, each 0.1 increment in the model output probability was associated with a 28% higher risk of developing HF, adjusted for age, sex, and comorbidities and accounting for the competing risk of death (aHR, 1.28 [95% CI, 1.26-1.30]) (Table 2). Higher model probabilities were associated with progressively higher HF risk across probability bins, with consistent patterns across hospitals and the outpatient network (eFigure 4 and eTables 4, 8, and 9 in Supplement 1).
Across the UKB and ELSA-Brasil cohorts, a 0.1 increment in model probability was associated with 49% and 65% higher adjusted HF risk, respectively (aHR, 1.49 [95% CI, 1.36-1.63] for the UKB cohort and 1.65 [95% CI, 1.46-1.87] for the ELSA-Brasil cohort) (Table 2).
The AI-ECG model had a discrimination based on Harrel C statistic of 0.723 (95% CI, 0.694-0.752) in YNHHS, compared with 0.640 (95% CI, 0.612-0.668) for PCP-HF (P < .001), and 0.674 (95% CI, 0.645-0.703) for PREVENT (P < .001) (Table 3). The AI-ECG model’s discrimination for HF was 0.736 (95% CI, 0.606-0.867) in UKB and 0.828 (95% CI, 0.692-0.964) in ELSA-Brasil, which were not significantly different from the clinical risk scores (marginal difference over Harrel C statistic: for AI-ECG vs PCP-HF, 0.004 [95% CI, −0.165 to 0.173; P = .96] in UKB and −0.023 [95% CI, −0.194 to 0.149; P = .80] in ELSA-Brasil; for AI-ECG vs PREVENT, −0.017 [95% CI, −0.197 to 0.164; P = .86] in UKB and −0.054 [95% CI, −0.218 to 0.111; P = .52] in ELSA-Brasil). Across cohorts, incorporating AI-ECG predictions in addition to PCP-HF and PREVENT resulted in improved Harrel C statistics (difference in addition to PCP-HF, 0.080-0.107; difference in addition to PREVENT, 0.069-0.094) compared with the use of the clinical risk equations alone. However, this increase was not statistically significant in the UKB cohort (Table 3). Further, in all cohorts, the AI-ECG discrimination for new-onset HF was similar to the base input features for the clinical risk scores (eTable 10 in the Supplement). Incorporating AI-ECG predictions with the base features resulted in significantly higher Harrel C statistics for both PCP-HF and PREVENT input variables in YNHHS and for PREVENT input variables in ELSA-Brasil.
| Covariates | YNHHS | UKB | ELSA-Brasil | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Harrel C statistic | Marginal difference over Harrel C statistic for clinical risk score | P value | Harrel C statistic | Marginal difference over Harrel C statistic for clinical risk score | P value | Harrel C statistic | Marginal difference over Harrel C statistic for clinical risk score | P value | |
| PCP-HF | |||||||||
| PCP-HF | 0.640 (0.612 to 0.668) | NA | NA | 0.732 (0.620 to 0.844) | NA | NA | 0.850 (0.789 to 0.912) | NA | NA |
| AI-ECG model output probability | 0.723 (0.694 to 0.752) | 0.083 (0.044 to 0.122) | <.001 | 0.736 (0.606 to 0.867) | 0.004 (−0.165 to 0.173) | .96 | 0.828 (0.692 to 0.964) | −0.023 (−0.194 to 0.149) | .80 |
| AI-ECG model output probability, age, sex | 0.720 (0.692 to 0.748) | 0.081 (0.049 to 0.112) | <.001 | 0.800 (0.707 to 0.894) | 0.068 (−0.060 to 0.196) | .30 | 0.897 (0.820 to 0.975) | 0.047 (−0.064 to 0.157) | .41 |
| AI-ECG model output probability, PCP-HF | 0.747 (0.721 to 0.773) | 0.107 (0.078 to 0.136) | <.001 | 0.812 (0.722 to 0.902) | 0.080 (−0.013 to 0.172) | .09 | 0.935 (0.898 to 0.971) | 0.084 (0.010 to 0.160) | .03 |
| PREVENT | |||||||||
| PREVENT | 0.674 (0.645 to 0.703) | NA | NA | 0.753 (0.635 to 0.871) | NA | NA | 0.882 (0.762 to 0.906) | NA | NA |
| AI-ECG model output probability | 0.723 (0.694 to 0.752) | 0.049 (0.009 to 0.088) | .02 | 0.736 (0.606 to 0.867) | −0.017 (−0.197 to 0.164) | .86 | 0.828 (0.692 to 0.964) | −0.054 (−0.218 to 0.111) | .52 |
| AI-ECG model output probability, age, sex | 0.720 (0.692 to 0.748) | 0.046 (0.012 to 0.080) | .007 | 0.800 (0.707 to 0.894) | 0.047 (−0.088 to 0.182) | .49 | 0.897 (0.820 to 0.975) | 0.016 (−0.088 to 0.119) | .77 |
| AI-ECG model output probability, PREVENT | 0.768 (0.742 to 0.793) | 0.094 (0.068 to 0.120) | <.001 | 0.822 (0.730 to 0.913) | 0.069 (−0.019 to 0.157) | .12 | 0.950 (0.927 to 0.974) | 0.069 (0.011 to 0.127) | .02 |
AI-ECG had an IDI of 0.091 to 0.205 vs PCP-HF and 0.068 to 0.192 vs PREVENT. The AI model was associated with a significant improvement in continuous NRI in the YNHHS cohort, but not in UKB or ELSA-Brasil (Table 4). The AI-ECG model also had a categorical NRI of 18.2% to 47.2% vs PCP-HF and 11.8% to 47.5% vs PREVENT, while this did not reach statistical significance for the comparison with PCP-HF in UKB. This improvement in categorical NRI was driven by improved event NRI, while nonevent NRI decreased (eTable 11 in Supplement 1). Despite the differential improvement in reclassifying cases and controls, the AI-ECG’s positive predictive value was comparable with PCP-HF and PREVENT across sites (eTable 12 in Supplement 1). The AI-ECG model demonstrated consistent superior net benefit over PCP-HF across probability thresholds greater than 0.06 across data sources, where the AI-ECG threshold was 0.08 (eFigure 5 in Supplement 1). A positive AI-ECG screening result was independently associated with HF risk after accounting for the clinical risk scores, with consistent patterns across racial groups (eFigures 6-9 in Supplement 1).
| Metric | YNHHS | UKB | ELSA-Brasil | |||
|---|---|---|---|---|---|---|
| PCP-HF | PREVENT | PCP-HF | PREVENT | PCP-HF | PREVENT | |
| IDI | 0.091 (0.068 to 0.118) | 0.068 (0.044 to 0.098) | 0.103 (0.011 to 0.214) | 0.113 (0.024 to 0.211) | 0.205 (0.075 to 0.347) | 0.192 (0.064 to 0.339) |
| NRI | ||||||
| Categorical | 0.182 (0.100 to 0.263) | 0.118 (0.034 to 0.199) | 0.198 (−0.076 to 0.465) | 0.289 (0.017 to 0.537) | 0.472 (0.131 to 0.749) | 0.475 (0.173 to 0.809) |
| Continuous | 0.210 (0.094 to 0.325) | 0.207 (0.094 to 0.323) | 0.096 (−0.347 to 0.506) | 0.309 (−0.140 to 0.724) | 0.095 (−0.242 to 0.324) | 0.188 (−0.268 to 0.531) |
In YNHHS, a positive AI-ECG screening result was associated with a modestly elevated risk of stroke and MACE (age- and sex-adjusted HRs: stroke, 1.18 [95% CI, 1.09-1.27]; MACE, 1.76 [95% CI, 1.70-1.82]) (eTables 13 and 14 in Supplement 1) compared with a 3-fold increase in HF risk. In UKB and ELSA-Brasil, a positive screening result was associated with a 1.5- to 4-fold increased risk of stroke, death, and MACE compared with a 6- to 9-fold increase in HF risk.
Across clinically and geographically distinct cohorts, a noise-adapted AI model trained to detect concurrent LVSD from a lead I ECG identified individuals with a risk of future HF hospitalization among those seeking outpatient care and community-dwelling adults. Individuals with a positive AI-ECG screening result had a 3- to 7-fold higher risk of developing HF compared with those with a negative screening result, independent of demographic and clinical characteristics. Higher AI-ECG probabilities were associated with progressively higher HF risk, with each 10% increment associated with a 27% to 65% higher risk-adjusted hazard for HF across cohorts. Further, the AI-ECG model demonstrated incremental discrimination, improved reclassification, and superior net benefit over the PCP-HF and PREVENT equations, with some differences seen in the individual cohorts. Therefore, our AI-based approach demonstrates promising characteristics for use as a noninvasive digital biomarker for assessing HF risk using a single-lead ECG.
Applications of deep learning for ECGs have demonstrated the ability to identify subtle signatures of structural heart disorders previously considered electrically silent,29,44–53 with applications extending to detecting LVSD from single-lead tracings.21,22,54–56 Further, the US Food and Drug Administration recently approved an AI tool using electronic stethoscope–based single-lead ECGs for cross-sectional LVSD detection.57 Our study demonstrates that a noise-adapted AI-ECG model can predict new-onset HF risk using single-lead ECGs with variable performance across populations. Given the increasing accessibility of portable devices capable of acquiring ECGs outside a clinical setting,20,58,59 this approach could potentially be applied more widely to identify individuals at risk of HF, although prospective study is needed.59 While the ECGs acquired with these devices are often distorted by electrode movements or artifacts due to skeletal muscle contraction during acquisition,27,60 our unique noise-adapted training approach might enable reasonable inference from ECGs transmitted from portable devices.21
In this study, we opted for a definition of HF based on the principal discharge diagnosis code, a criterion with high specificity.32 Nonetheless, the association of a positive screening result with elevated HF risk was consistent across several sensitivity analyses defining the condition differently in the YNHHS and UKB cohorts and in the ELSA-Brasil cohort, where the outcomes were explicitly adjudicated. The performance across clinically and demographically distinct cohorts indicates that the model may capture a predictive HF signature independent of site-specific coding practices.61–63 Moreover, the dose-dependent association of higher AI-ECG scores with progressively elevated HF risk may enable more precise risk stratification and risk-informed management. Notably, while a positive screening result was also associated with a modestly elevated risk of other cardiovascular outcomes, including MACE, the predictive signature was more specific for HF.
Our study has important implications for defining HF risk. While several clinical risk scores have been proposed to identify individuals at high risk, these strategies often require clinical evaluation and blood testing.8,9,16 This limits their scope to patients with established access to health care services.9,16,64–66 In contrast, our AI-based approach using single-lead ECGs may offer a means for HF risk stratification outside clinical settings. Notably, the model demonstrated positive IDI, improved reclassification, and greater net benefit compared with the PCP-HF and PREVENT equations across sites, although with variable performance within cohorts. While the improvement of discrimination did not reach statistical significance in the UKB and ELSA-Brasil cohorts, consistent improvement in categorical NRI—relevant for clinical decision-making—indicates the clinical utility of the AI model.67 Furthermore, the model’s positive predictive value was comparable to that of the PCP-HF and PREVENT equations, suggesting that an AI-ECG–based strategy for screening may not lead to unnecessary additional testing. Despite these advancements, the AI-ECG model does not eliminate the need for clinical risk scores, but it does offer a potential resource-efficient strategy for use in community settings if validated prospectively using portable devices. The risk scores might represent an adjunct in these settings where the focus may be more on the identification of modifiable risk factors.
The ability to use a single portable device to record ECGs for multiple individuals could support the design of efficient community-based screening programs.68,69 Successful health promotion strategies, such as targeted hypertension management in barbershops and cancer screening in churches across the US,70,71 can be adapted to promote HF screening, especially among those who are less likely to seek preventive care.64 The ease of use and the brief time for ECG acquisition with portable devices can enable a non–laboratory-based strategy, potentially suitable for integration into noncommunicable disease screening programs globally, especially in low- and middle-income countries.68,69,72 This potential scalability and the possible community health benefits necessitate prospective clinical and cost-effectiveness assessments for AI-based HF risk stratification using portable devices.
Our study has certain limitations. First, waveforms extracted from lead I of clinical ECGs may not be identical to those from portable devices. While our noise-augmentation approach previously demonstrated sustained performance on ECGs with real-world noises,21 prospective validation of the model on ECGs acquired by a portable device is necessary before deployment for community HF screening. This includes evaluating device types, acquisition methods, and handling of ECG segments of longer durations. Second, despite YNHHS’s wide geographic coverage, out-of-hospital clinical outcomes may not have been captured, thereby representing a lower HF risk compared with the protocolized follow-up in the UKB and ELSA-Brasil cohorts. Moreover, while we included only ECGs performed in an outpatient setting, the patients underwent clinically indicated ECGs, indicating an unmeasured potential risk profile of those who had a negative AI-ECG screening result. However, the control patients in this setting underwent an ECG as well. Third, the number of HF outcome events was very low in the UKB and ELSA-Brasil cohorts. However, the HF hospitalization rates were comparable to those of other population-based cohorts,73,74 and the outcome capture and adjudication in UKB and ELSA-Brasil have been extensively validated.75–82 Further, in UKB, the smaller subset of participants who underwent an ECG, the shorter follow-up period after the ECGs were performed, and our approach of excluding those with prevalent HF may have contributed to the lower absolute number of incident HF events in our study. Fourth, given the lack of NT-proBNP assessments in UKB and ELSA-Brasil, we could not evaluate NT-proBNP as a comparator in this study. In YNHHS, the use of NT-proBNP level could incorporate substantial selection bias since it is typically ordered for evaluation of cardiopulmonary symptoms and rarely for primary prevention. Nevertheless, a future head-to-head assessment of AI-ECG and NT-proBNP level as predictors for HF is warranted. Further, while we performed an analysis that excluded individuals with elevated pre-ECG NT-proBNP levels in YNHHS, the lack of NT-proBNP testing precluded this analysis in UKB and ELSA-Brasil. Fifth, while the PCP-HF and PREVENT equations are used to estimate the 10-year risk of HF, we applied these risk scores to assess HF risk during the available follow-up period (<5 years). Nevertheless, we factored in varying follow-up durations for each individual for our comparison among risk stratification strategies. Sixth, while the AI-ECG approach identifies individuals at elevated HF risk, it is unclear if this risk is modifiable from these data. Nonetheless, a robust screening strategy can enable targeted management of known HF risk factors.
Across clinically and geographically distinct cohorts, we used a noise-resilient AI model from a conventional lead I ECG tracing as the sole input to define the risk of future HF hospitalization and demonstrated incremental improvement over clinical risk scores. With the increasing availability of single-lead ECGs transmitted from portable and wearable devices, future studies are required to determine if this AI-ECG–based noninvasive digital biomarker can enable improved stratification of HF risk across communities.
Accepted for Publication: February 13, 2025.
Published Online: April 16, 2025. doi:10.1001/jamacardio.2025.0492
Corresponding Author: Rohan Khera, MD, MS, Section of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, 195 Church St, Sixth Fl, New Haven, CT 06510 (rohan.khera@yale.edu).
Author Contributions: Drs Dhingra and Khera had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Drs Dhingra and Aminorroaya contributed equally as co–first authors.
Concept and design: Dhingra, Aminorroaya, Khunte, Krumholz, Khera.
Acquisition, analysis, or interpretation of data: Dhingra, Aminorroaya, Pedroso Camargos, Khunte, Sangha, McIntyre, Chow, Asselbergs, Brant, Barreto, Ribeiro, Oikonomou, Khera.
Drafting of the manuscript: Dhingra, Aminorroaya, Khunte, Oikonomou.
Critical review of the manuscript for important intellectual content: Pedroso Camargos, Sangha, McIntyre, Chow, Asselbergs, Brant, Barreto, Ribeiro, Krumholz, Khera.
Statistical analysis: Dhingra, Aminorroaya, Pedroso Camargos, Khunte, Sangha, Barreto, Oikonomou, Khera.
Obtained funding: Barreto, Ribeiro, Khera.
Administrative, technical, or material support: Pedroso Camargos, McIntyre, Ribeiro, Oikonomou, Khera.
Supervision: Asselbergs, Khera.
Conflict of Interest Disclosures: Mr Khunte reported having US provisional patent 63/428,569 pending outside the submitted work. Mr Sangha reported having patent 63/428,569 pending, patent 63/346,610 pending, patent 63/484,426 pending, and being a cofounder of Ensight-AI outside the submitted work. Dr Chow reported receiving a grant from the National Health and Medical Research Council during the conduct of the study. Dr Asselbergs reported receiving support from Heart4Data (funded by the Dutch Heart Foundation and ZonMw) and UCL Hospitals NIHR Biomedical Research Centre. Dr Brant reported receiving support from the Brazilian National Council for Scientific and Technological Development (CNPq). Dr Ribeiro reported receiving grants from CNPq. Dr Krumholz reported having equity in Ensight-AI during the conduct of the study; and working under contract with the Centers for Medicare & Medicaid Services, receiving research contracts through Yale University from Janssen, Kenvue, and Pfizer, receiving options for Element Science and Identifeye and payments from F-Prime for advisory roles, and being a cofounder of and holding equity in Hugo Health, Refactor Health, and Ensight-AI outside the submitted work. Dr Oikonomou reported being a cofounder of Evidence2Health, serving as a consultant to Caristo Diagnostics Ltd and Ensight-AI, having stock options in Caristo Diagnostics Ltd, receiving a grant from the National Heart, Lung, and Blood Institute of the National Institutes of Health, and having patents 63/508,315 and 63/177,117 outside the submitted work. Dr Khera reported receiving grants from the National Heart, Lung, and Blood Institute, National Institutes of Health, Doris Duke Charitable Foundation, Bristol Myers Squibb, Novo Nordisk, BridgeBio, and Blavatnik Foundation, being an academic cofounder of Ensight-AI and Evidence2Health, having patents 63/346,610, WO2023230345A1, US20220336048A1, 63/484,426, 63/508,315, 63/580,137, 63/606,203, 63/619,241, and 63/562,335 pending, and serving as associate editor of JAMA outside the submitted work. No other disclosures were reported.
Funding/Support: Dr Oikonomou was supported by the National Heart, Lung, and Blood Institute of the National Institutes of Health (F32HL170592). Dr Khera was supported by the National Institutes of Health (R01AG089981, R01HL167858, and K23HL153775) and the Doris Duke Charitable Foundation (2022060).
Role of the Funder/Sponsor: The funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; or decision to submit the manuscript for publication.
Disclaimer: The views expressed in this article are those of the authors and not necessarily any funders.
Data Sharing Statement: See Supplement 2.
ArticlePubMedGoogle ScholarCrossref
ArticlePubMedGoogle ScholarCrossref
ArticlePubMedGoogle ScholarCrossref
ArticlePubMedGoogle ScholarCrossref
More for You
-
Research
Detecting Transthyretin Cardiac Amyloidosis With Artificial Intelligence
November 10, 2025 -
Research
ECG vs EPS in Myotonic Dystrophy Type 1 Risk Stratification
September 24, 2025 -
Research
AI-Enhanced Electrocardiography for Heart Block Risk Stratification
August 20, 2025









