*Dr Jaijus Pallippadan-Johny ^{1}, Dr John McDermott^{2}, Rodney Jones^{1} and Michael Duddin^{1 }(^{1} Wigram Capital Advisors, Auckland; ^{2}Motu Economic and Policy Research, Wellington)*

**In this blog, we introduce our modelling approach to estimating the transmissibility of the SARS-CoV-2 virus, the cause of the COVID-19 pandemic. We demonstrate the usefulness of the Wallinga model for the calculation of the effective reproduction number and show the major impact of the lockdown on containing the pandemic in New Zealand.**

The global COVID-19 pandemic represents an unprecedented public health and economic crisis. As the SARS-CoV-2 virus spread across the globe in early 2020, there was a clear need to find a modelling approach that could be used by policymakers and non-specialists to monitor the spread of the virus, and allow timely risk assessments to be made at critical times.

The reproduction number (R_{0}) is an epidemiologic metric used to describe the transmissibility of an infectious disease. Simply, it is the average number of secondary cases generated by a primary case in a fully susceptible population. The actual transmissibility is usually not constant but varies over time in response to such things as an influx of infected travellers into a country, containment measures, as well as voluntary social distancing. Hence, using an effective reproduction number (Re), which represents the number of secondary cases per infectious case at time *t*, is a better measure.

It is extremely difficult to measure the value of Re during an epidemic because of the lack of information about who infected who and details of individual outbreaks. For this reason, mathematical models can be used to quantify R, which in turn can provide important insights into the significance of the outbreak.

In the aftermath of the SARS pandemic in 2003, Wallinga and Teunis (2004) developed a likelihood-based estimation of R based on generation time distribution (w(τ)) and daily onset cases. There is an open source R software package that allows for estimating R_{0} using the Wallinga method.

The approach adopted by Wallinga and Teunis allows the calculation of daily Re, which in turn can be used to construct a forecasting model of likely new cases over the next few weeks. This provided an important tool for monitoring the path of the SARS-CoV-2 virus in real time.

### Generation time

Estimation of the Re requires two elements: data on the number of new cases per day, and an assumed probability distribution of generation intervals. The *generation interval* describes the time duration from the onset of infectiousness in a primary case to the onset of infectiousness in a secondary case infected by the primary case. As infection time is generally unobservable, it is common practice to estimate the generation interval via the *serial interval*, which is the time between symptom onset for an infected person and symptom onset for their infector. To provide a suitable generation interval we need to assume the mean, standard deviation (SD) and its shape. Historical outbreaks of similar diseases can be useful in calibrating these assumptions.

For example, Lipsitch et al estimated a serial interval with a mean of 8.4 days and SD of 3.8 days for the SARS 2003 outbreak. Li et al found a mean of 7.5 days and SD of 3.4 days from the outbreak of COVID-19 in Wuhan (China). Other researchers report a smaller serial interval of about 4 to 5 days (see Ferretti et al and Nishiura et al).

However, it is now widely accepted that a considerable number of COVID-19 patients can be asymptomatic and there is a high probability for the onset of secondary cases longer than 12 days. This tells us that longer generation times from the COVID-19 studies should be considered.

In our model, for estimating the COVID-19 Re, we used a generation time distribution with a peak probability at 5 days and a long tail of 28 days, as indicated in figure 1. While the length of this tail is unusual, it does seem to better represent the behaviour of the SARS-CoV-2 virus, and the persistence of small positive case numbers in places such as South Korea and Hong Kong.

### Comparing SARS and COVID-19

Given the parallels with the SARS outbreak, the case data from Hong Kong in 2003 provided an important baseline for the path of SARS-CoV-2. The Re of SARS in Hong Kong in figure 2 is calculated using the generation time distribution given by Lipsitch et al. The results are compared for the period starting when cumulative cases are greater than 300. It was clear from late January 2020 onwards that the Re of SARS-CoV-2 was significantly higher than with SARS.

The use of Re allows an assessment to be made as to the seriousness of any particular outbreak and provides an insight into the number of independent infection chains. This is particularly important when an outbreak can be seeded by travellers from abroad, or by movement between cities and regions within a country.

### Forecasting onset cases

The forecast model generates multinomially distributed secondary cases based on the generation time distribution, Re, and the known primary cases in a time series manner. The model generates 10,000 epidemic curves for each Re value ranging from 0.1 to 6 with a resolution of 0.1. The epidemic curve that is best fitted to the actual data is then used to project the daily cases over a short-run horizon. We found this approach effective in forecasting case numbers across a number of outbreaks across a 7 to 10-day horizon.

### Tracking the effective reproduction number: New Zealand’s experience

When it was announced that New Zealand was entering a restrictive lockdown on 24 March, the country had 174 reported cases of COVID-19 but with a Re over the previous week of 3.4, having been above 4.5 in the prior week. This signalled that New Zealand was facing multiple outbreaks and faced the prospect of escalating case numbers over the next few weeks.

Models run around 22 March, prior to the move to Level 4 lockdown, forecasted that total case numbers in New Zealand would approach 4,000 within a few weeks. However, by early April the effects of the border closure on 19 March and the full lockdown on 26 March were starting to be observed in actual case numbers, and there was a marked downshift in the forecast for future cases. The model was very accurate day-to-day as case numbers were accelerating prior to lockdown, whereas it began to over-forecast case numbers as the lockdown progressed. This was evidence that the containment policy was working.

By the time lockdown restrictions were eased on 27 April, the Re had declined to 0.2, with new case numbers declining to zero on some days over subsequent weeks.

In summary, the New Zealand experience has illustrated the usefulness of the Wallinga model for the calculation of a real-time Re for the COVID-19 pandemic, particularly when coupled with a forecasting model for case numbers.

### References

- Nishiura, H., et al. Serial interval of novel coronavirus (2019-nCoV) infections. Int J Infect Dis 2020;93:283-286. doi:10.1101/2020.02.03.20019497.
- Li, X., et al. Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia. N Engl J Med. 2020;382, doi:10.1056/NEJMoa2001316.
- Lipsitch, M., et al. Transmission dynamics and control of severe acute respiratory syndrome. Science 2003;300:1966–70.
- Ferretti, L., et al. Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing. Science 2020;368, doi: 10.1126/science.abb6936.
- Wallinga, J., Teunis, P. Different Epidemic Curves for Severe Acute Respiratory Syndrome Reveal Similar Impacts of Control Measures. Am J Epidemiol 2004;160:509-516.
- Bernard-Stoecklin, S., et al. Comparative Analysis of Eleven Healthcare-Associated Outbreaks of Middle East Respiratory Syndrome Coronavirus (Mers-Cov) from 2015 to 2017. Sci Rep 2019;9. https://doi.org/10.1038/s41598-019-43586-9.

Thank you for a very clear presentation of your forecasting method, understandable even by a non academic. It’s impressive that the model matches the dip around March 29. I’m wondering if the actual figures exclude imported cases (people who arrived in NZ already infected) since these don’t relate to to the Re in NZ

Thanks. The chart here is of confirmed and probable cases, but we were also modelling confirmed cases alone as well as domestic cases. Broadly, the Re’s moved together, although the domestic Re was consistently higher. The argument for including imports is that unless they are in quarantine they still have the potential to seed independent outbreaks, which is what happened.

“However, it is now widely accepted that a considerable number of COVID-19 patients can be asymptomatic and there is a high probability for the onset of secondary cases longer than 12 days. This tells us that longer generation times from the COVID-19 studies should be considered.”

If I understand this statement correctly you are using a long tail in your generation time distribution to try to account for transmission by subclinical cases not observed in the dataset. Doesn’t this mean you will overestimate Re, possibly considerably, or do you try to correct for this in some way?

Our approach is fundamentally a time series approach, which also has a focus on forecasting. We have a model that accounts for the data, not some underlying “truth”. We have considered many distributions and chose the one that best fits the data, both in NZ and elsewhere. We did a large amount of sensitivity analysis over the choice of the standard error, and there was persistent case for a larger one. This tells us something about the underlying process.

This makes some sense, as the long tail in our study would include secondary cases which are subclinical and asymptomatic within 12 days of being infected but are tested positive at a later date. There are cases reported in New Zealand and worldwide which belong to this category. These cases are not hidden cases and have represented a challenge to achieving outright elimination. This sort of tail is consistent with a large standard error.

Thanks. Can you clarify what data you used to estimate your generation time distribution? In particular, are you saying you’ve fitted the generation time distribution to daily reported case data (as is suggested by your comment about late positive tests)?