Preliminary Coronavirus Outbreak Projections

This page collects the various projections for the coronavirus outbreak that I've been able to find, along with my own calculations and projections.

Please note that all of these projections are speculative and preliminary. I am not an epidemiologist - and even the expert produced models collected on this page are based on preliminary data.

If you have any thoughts/comments on my calculations, or any other projections that I've missed please contact me!    Follow me on Twitter for  updates @joshuafkon

My Own Calculations


A more sophisticated model is the Susceptible-Infectious-Recovered-Dead (SIRD) model below.

Currently, the high estimated R0 of the virus means that the virus is projected to spread widely - peaking on June 7th.


[Updated As of: (2/26 7:00am EST)   CFR 12.11% R0 3.73]

Where Did This CFR and R0 Value Come From?


Obviously, this is not a truly realistic model as it does not take into account any measures that would slow the spread of the virus.


Eventually containment measures would be put in place which would decrease the R0 of the virus. It has been calculated  (Sanche et al.,) that the measures the Chinese have taken have reduced the R0 of the virus in China by up to 59%. Using this same percentage reduction on our estimated R0, and assuming that these measures are put in place when 1% of the population is infected, provides the much more realistic scenario shown in the graph below.

In this scenario, the outbreak would peak on July 4th, and ultimately 62.3% of the country would be infected at some point.


Model Setup, Calculations, Sources, Etc.


Number Infected In The United States and When

First we need to estimate how many have been infected with the coronavirus in the United States - and on what date.

So far there is no evidence of community spread within the United States. However, there is a strong likelihood that some case where imported from China and have not been detected. Why do we think this?

In the study Quantifying bias of COVID-19 prevalence and severity estimates in Wuhan, China that depend on reported cases in international travelers which makes the case that there is "variation among countries in detection capacity for imported cases." Using Singapore's historically strong epidemiological surveillance and contact-tracing capacity as the gold standard, and adjusting for the numbers of travelers from Hubei province, they calculate that even  high surveillance countries detect only 40% (95% HPDI 22% - 67%) of coronavirus cases

Excluding cases that were returned to the United States while presumed positive from the Diamond Princess cruise ship, there have been 15 cases detected in the United States. Using the average cases that go undetected from the earlier study we can infer that there are ~22 people with the virus in the United States as of 2/19/2020. This is the number we will use as the starting point for our model (although it is probably a low-range estimate as it assumes no spread has yet taken place in the United States. Moreover, the model will not take into account the very strong possibility that additional cases will be imported in the future.

R0 Calculation

For my model I am using the weighted average R0 values of the studies below, with each study weighted to lose 2% for each day earlier than the latest published study. E.g. a study based on data from 1/22/20  would be weighted 30% less than a study with data up to February 5th (15 days)

Preliminary estimation of the basic reproduction number of novel coronavirus (2019-nCoV) in China, from 2019 to 2020

Novel coronavirus 2019-nCoV: early estimation of epidemiological parameters and epidemic predictions

Transmission dynamics of 2019 novel coronavirus (2019-nCoV)

Reporting, Epidemic Growth, and Reproduction Numbers for the 2019 Novel Coronavirus (2019-nCoV) Epidemic

Report 3: Transmissibility of 2019-nCoV

Early Transmissibility Assessment of a Novel Coronavirus in Wuhan, China

The Novel Coronavirus, 2019-nCoV, is Highly Contagious and More Infectious Than Initially Estimated

Epidemiological and clinical features of the 2019 novel coronavirus outbreak in China

This results in a R0 of 3.73

Average latent period (days) and Average duration of infectiousness (days)

These values, 7.5 days and 3.6 days respectively, were adopted from the SEIR model described in A spatial model of CoVID-19 transmission in England and Wales: early spread and peak timing which in turn estimates these values from the various studies which have been released to date.


CFR calculation is a simple ratio of the number of cases from countries who scored at least 50 out of 100 on the 2019 Global Heath Security Index's measure of their ability to detect and report emerging epidemics which have resulted in either recovery or death. Source for recoveries and deaths.

CFR calculation is e2(s)=D(s)/{D(s)+R(s)} where, D(s) and R(s) denote the cumulative number of deaths and recoveries.


Why use this methodology? Firstly because of the unreliability of the data from mainland China and other countries with low detection capacity adds uncertainly to other statistical methods, Moreover studies on calculating the CFR in the SARS epidemic showed that this approach was "reasonable at most points in the epidemic" although it briefly underestimated the true CFR at one point.

Population Size

This model restricts itself to the United States and does not account for any movement between countries. Currently the US population is estimated at 329,300,000.


The model is an SIRD model - which is a derivative of the classic SIR model

Within the model there are several variables derived from out initial inputs.

Average number of individuals effectively contacted per time step

This is equal to the R0/Average duration of infectiousness.

Probability of effective contact between 2 individuals per time step

This is equal to average number of individuals effectively contacted per time step (just calculated) / the total population

Average rate of infectious disease onset per time  step

This is equal to our time step (one day) / the average latency period.

Average recovery rate per time step

Equal to our time step (one day / Average duration of infectiousness (days)

With these variables the population can be sorted into susceptible, infectious, recovered, and deceased categories. 

Working on posting the files and making the model itself available.