Clinical prediction models in the COVID-19 pandemic – helpful or misleading?

patient centricity

What for?

Facing the world-wide spread of coronavirus disease 2019 (COVID-19) infections with hospitals overcharged and a shortage of medical equipment in several countries, diagnostic and prediction models might help to identify patients with COVID-19 and predict the likelihood of the disease outcome (e.g. severity of disease, recovery, death).

From patients with confirmed disease, who is likely to progress to severe disease, who is likely to recover, who is likely to die (prognosis)?

  • For medical staff, prognostic models could offer assistance in allocating limiting health care resources and support the decision who needs the treatment most. Although the availability and speed of nucleic acid tests by real-time reverse transcription polymerase chain reaction (RT-PCR) has been improved in most countries, results become available with delay and show relatively high false-negative results in clinical practice caused by unstable specimen processing or differing specimen collection procedures.

    Hence, an early and accurate diagnosis based on diagnostic models might be used as an alternative to RT-PCR.

Can we diagnose covid-19 in patients with suspected disease (diagnosis)?

  • This post is based on a recent systematic review by Wynants, Van Calster et al. (2020).


Models of various types have recently been proposed for diagnosis and prognosis of COVID-19 including rule based scoring systems, multivariate logistic regression and advanced machine learning models. Machine learning models are used to extract features from computer tomography (CT) images. When building a predictive model adequate selection of predictors is crucial and can be achieved by variable selection procedures (backward selection, ridge regression, LASSO etc.) but should also consider expert opinion. The review by Wynants, Van Calster et al. (2020) systematically screened COVID-19 studies (2696 titles) and identified 31 prediction models for the diagnosis and prognosis of COVID-19 that were critically assessed.

Relevant predictors that were identified in more than one model are shown below:

Diagnostic Model Prognostic Model
Age X X
Sex X
Desease symptoms
Body temperature/fever X
(Respiratory) signs/symptoms
 (such as shortness of breath, headache,
 shiver, sore throat, and fatigue)
Laboratory parameters
C-reactive protein X
Lactic dehydrogenase X
Lymphocyte count X
Albumin or albumin/globin X X
Direct bilirubin X X
Red blood cell distribution width X X
Features derived from CT scans X


Most models proposed for diagnosis and prognosis of COVID-19 show excellent discriminative performance. However, all models reviewed by Wynants, Van Calster et al. (2020) were judged to be at high risk of bias. This is mainly because the models were fitted on data that were not representative of the target population. For instance, people without COVID-19 (controls) were underrepresented in diagnostic models. In prognostic models most studies excluded patients who neither recovered nor died at the end of the study period. However, to avoid sampling bias censoring should be accounted for.

Currently, most studies use data from China. Hence, generalizability of these findings to other countries with differing ethic groups, living conditions, and health care systems might be difficult. Moreover, many of manuscripts have not been peer reviewed at the time of the systematic review.

Now what?

Facing the coronavirus 2019 pandemic, authors worldwide have developed an astonishing amount of diagnostic and prognostic models in a short amount of time. Models should be validated and updated in larger, international datasets representative of the target population and thoroughly peer reviewed before using them in clinical applications. Eventually, those models might help to detect COVID-19 infections in patients with symptoms and predict the course of a diagnosed COVID-19 infection with a high discriminative performance.

However, if not carefully validated for a representative population, models could do more harm than good.


Wynants, L., B. Van Calster, M. M. J. Bonten, G. S. Collins, T. P. A. Debray, M. De Vos, M. C. Haller, G. Heinze, K. G. M. Moons, R. D. Riley, E. Schuit, L. J. M. Smits, K. I. E. Snell, E. W. Steyerberg, C. Wallisch and M. van Smeden (2020). “Prediction models for diagnosis and prognosis of covid-19 infection: systematic review and critical appraisal.” Bmj 369: m1328.

Picture: @alexkich/


Get the latest articles as soon as they are published: for practitioners in clinical research

  • Read about ideas & tools for effective clinical research

  • Follow today’s topics in clinical research

  • Knowledge base: study design, study management, digitalization & data management, biostatistics, safety

  • It’s free! Sign up now!

We use the Google service reCaptcha to determine whether a person or a computer makes a specific entry in our contact or newsletter form. Google uses the following information to determine if you are a human being or a computer: IP address of the terminal device you are using, the website you are visiting and on which the captcha is integrated, the date and duration of the visit, the identification data of the browser and operating system type used, Google account if you are logged in to Google, mouse movements on the reCaptcha areas and tasks for which you must identify images. The legal basis for the described data processing is Art. 6 para. 1 lit. f General Data Protection Regulation. There is a legitimate interest on our part in this data processing to ensure the security of our website and to protect us from automated input (attacks).