The early detection of COVID-19 is helpful for resource allocation during the pandemic. Here, we have proposed a Bayesian approach to identify individuals with probable COVID-19 on the basis of self-reported symptoms during 1, 2, and 3 days after symptom onset and demographic information. Using a unique, prospective dataset, we evaluated the proposed method by analysing its ability to predict COVID-19 in the early stages of infection, identify sets of symptoms that can be used to characterise early signs of infection in subgroups of the population, and consider the certainty of estimations for the model predictions to be used to direct people for testing, self-isolation, or both. This early diagnosis could then lead to better allocation of medical resources when the health-care system is severely strained by the pandemic. The proposed approach was compared with the methods currently used by the NHS and by related studies.7, 9, 20 Our model was effective in the identification of COVID-19 after 3 days of symptoms (AUC 0·80), with a mean sensitivity of 0·73 (SD 0·05) and a mean specificity of 0·72 (SD 0·02). Nevertheless, the model was hampered by the bias of data acquisition and requires further validation with external datasets. When compared with the other state-of-the-art diagnostic algorithms,7, 9 our proposed approach showed significantly better predictive accuracy and sensitivity. By analysing predictive AUC, sensitivity, and specificity in subgroups, we conclude that our model can be particularly relevant in the detection of early signs of COVID-19 for certain groups of the population, such as older patients.

Conversely, our proposed model was less accurate in the detection of COVID-19 among health-care workers compared with non-health-care workers.
We identified loss of smell, chest pain, persistent cough, abdominal pain, blisters on the feet, eye soreness, and unusual muscle pain as the most relevant features indicating early signs of COVID-19.7, 9, 10, 17, 18, 19 From a previous study7 on patients symptomatic for COVID-19, skipped meals and fever were highlighted as relevant symptoms in the identification of COVID-19.9, 10, 11, 20 However, our analysis showed that these features were not relevant to early disease in the unstratified population, and so skipped meals and fever should not be considered as first-line symptoms indicating that patients should have a COVID-19 test or self-isolate. In addition, among the comorbidities reported by the participants in our study, heart disease was the most relevant to the predictions. Although patients’ comorbidities did not directly affect the outcome of the model, they were included in the model as conditional variables. Therefore, the symptoms of COVID-19 reported by individuals with previous heart conditions should be further investigated and differentiated from the symptoms reported by the general population.
Using a hierarchical Gaussian process model, we further investigated the early signs of infection in subgroups of the population. Our initial results suggested that health-care workers showed distinctive features compared with non-health-care workers. For both groups, loss of smell was the most relevant feature for early diagnosis of COVID-19, but fatigue, headache, skipped meals, and unusual muscle pain were more relevant to health-care workers than to non-health-care workers.16 We believe that the workload faced by health-care workers during the pandemic increases both their exposure to the virus and their stress levels, which could explain the relevance of such symptoms; symptoms related to long-term stress could potentially lead to psychological symptoms that are translated as fatigue.16, 21, 22 Similarly, the unusual muscle pain could also be explained by long work periods in health-care settings and the physical demand of caretaking during the pandemic.16, 22 Our model also had a lower predictive power for health-care workers (AUC 0·76; 63% sensitivity) than it had for non-health-care workers (AUC 0·81; 76% sensitivity) after 3 days of self-reported symptoms.

This result could be explained by the differences between these groups in feature relevance and the possibility that health-care workers experience and report symptoms in a different way to non-health-care workers. We think that current studies investigating COVID-19 symptoms could benefit from a personalised model incorporating, and trained using, participants’ occupations.
By stratifying the relevance of symptoms per age group, we showed that early symptoms reported by participants from some different age groups varied.23 We found that loss of smell, a symptom that is being widely used to detect COVID-19, begins to lose relevance for people older than 60 years and is not a relevant feature for individuals 80 years or older. These new results suggest that the detection of early signs of COVID-19 could benefit from personalised models that factor in the age group of participants. The differences in feature relevance could also be explained by the small number of people in the age groups used, specifically older people who might be less prone to register their symptoms regularly, and fewer evident and aggressive symptoms in younger participants than in older participants.24, 25 Therefore, future research should focus on the development of sub-models targeting the specificities of the age subgroups that showed significantly different features. Nevertheless, the prediction of COVID-19 diagnosis across all the age groups had a consistently high certainty.
Despite the differences in prognosis and mortality for both sexes,26 we did not find any differences in the early signs of infection across sexes.
Our model’s performance was similar across the BMI subgroups to that in the unstratified test set, with the exception of the underweight subgroup, in whom the model had a lower AUC for 3 days of self-reported symptoms. However, our model had highly uncertain predictions for patients with obesity, with a decrease in the likelihood of the predictions with an increase in the number of timepoints. This result could partly be explained by other underlying medical conditions of participants with overweight that could hamper the correct assessment of early signs of infection. The number of participants with obesity in our study population was lower than the number of participants in any other BMI category, which compromised the ability of our model to correctly describe the early signs of infection in this subpopulation.

Our study had several strengths. First, it is unique; to our knowledge, this study was the first to attempt to detect early signs of COVID-19 using self-reported symptoms. Second, the models presented in this study were trained on a large population of 182 991 participants and subsequently validated on a fully independent sample of 15 049 individuals. Therefore, the sample size of our data supports the generalisability and robustness of our approach. The heterogeneity in the demographics of included individuals and the broad spectrum of symptoms reported also resulted in a generalisable model. Third, the prospective nature of symptoms logging in this study will potentially allow us to change the model design and further improve the proposed approach. We could then develop personalised models according to various population strata, such as age groups and occupation, and validate them in future analyses. Fourth, our proposed approach has a temporal component, which did not require the concatenation of symptoms across timepoints. This aspect ensured that the sequential presentation of symptoms was not neglected, while predicted labels of participants with a different number of timepoints were still generated. Finally, the information regarding the uncertainty of the predicted labels for each subpopulation can also be used as a surrogate measure of the likelihood of an individual to be positive for SARS-CoV-2 across the different timepoints, a major advantage when used in real-life scenarios.
Our study also had limitations. First, the self-reporting nature of the data, particularly the symptoms, could have negatively affected the performance of the models. Given that the models rely on prospective data collection to work, it was necessary that the participants recalled the exact symptom trajectory of their first 3 days and the symptoms onset, which might not have always been possible. The symptoms reported might also have been overestimated, both in intensity and time, by the participants. Furthermore, the absence of clinical scales for symptoms reporting and assessment can impact the understanding and translation of the symptoms profile into the clinical environment.

These factors can compromise the models’ performance, limiting their use as clinical tools. To overcome these limitations, complementary measurements obtained by wearable sensors and devices could be included as features. In fact, such devices have proven successful as clinical proxies of participants’ conditions and are viable solutions in assessing and validating self-reported symptoms.27, 28
Second, because of the method used for data acquisition—the mobile phone app—the study population was also skewed towards a younger population. Therefore, the translation of the proposed approach to other populations will require a detailed analysis of the participants’ demographics. Nevertheless, thanks to the flexibility and the non-parametric nature of our model, we believe that model performance will not be negatively impacted, even if the relevant features change.
Third, the assessment of symptoms relevance could have potentially been impacted by the sample size of the different population strata. To reduce the effect of small samples sizes, we reduced the granularity of the BMI subgroups. We addressed these limitations by doing an extensive validation on an independent sample, which included a bootstrapping scheme to reduce sample bias and compensate for different symptom prevalence across individuals and population strata.
Fourth, all the analyses presented in this study were done on the UK population, hence limiting the generalisability of our conclusions, as features of the study population can differ between countries. In fact, some of the population features considered for model estimation, namely obesity rates, age, comorbidities, and infection risk for health-care workers during the pandemic, could vary strongly between countries, including several low-income and middle-income countries.29, 30 Also, we did not do any specific analysis considering the ethnicity of the participants as a possible covariate in the model or confounding effect. Future work should focus on the validation of the proposed approach on different populations with different demographic features.
Finally, the guidelines for testing according to the available resources can be considered another key limitation of this study. Given that the likelihood of being offered a test in the UK is strongly dependent on the symptoms used for reference by the NHS,11 an individual’s occupation being considered among other factors, the outcome of the test itself can be biased. Similarly, different actions for the mitigation of COVID-19 across countries could also impact the manifestation of the disease and the test used as a reference to define SARS-CoV-2 positivity.

Early detection of SARS-CoV-2-infected individuals is crucial to contain the spread of the COVID-19 pandemic and efficiently allocate medical resources. In this study, we proposed a tailored hierarchical Gaussian process model to predict the early signs of infection using self-reported symptoms. This model allows us to refer individuals for testing and self-isolation even when only early symptoms are observed. In the future, our proposed model can integrate additional features, such as clinically relevant measures, to improve and reduce the bias associated with self-reported inputs