The detection of the British SARS-CoV-2 outbreak

After a year of relatively inadequate evolution, the emergence and worldwide spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) indicates an urgent need for better genetic detection (1). The United Kingdom (UK) has emerged as a leader in this field. An investment of £ 20 million in March 2020 established the COVID-19 Genomics UK (COG-UK) consortium (2), which produced> 200,000 SARS-CoV-2 genomes, more than twice the number produced by any other country. Such a large amount of data provides an unprecedented opportunity to detect what human activities are driving the growth of epidemics during a rapidly changing pandemic, but also poses numerous bioinformatics challenges. On page 708 of this issue, du Plessis et al. (3) describe a new hybrid phylogenetic approach that integrates genetic data with epidemiological and travel data to discover the roots of the severe spring epidemic in the UK. In particular, they find that the British epidemic was the result of more than 1,000 transmission lines sown by travelers from Europe.

The study showed how last winter’s control efforts were consistently a step behind the virus, enabling SARS-CoV-2 to penetrate national borders. Their analysis of ~ 26,000 British series from January to June 2020, the largest study of its kind, shows that the British epidemic was brought into the country mainly by travelers from European neighbors: first Italy, then Spain and France. A peak viral flow to the UK occurred in March when the virus spread in Western Europe, but overdue surveillance has led to restrictions still focusing on travelers arriving from Asia. By capturing a large number of small transmission lines that would not be detected at lower levels of virological observation, as well as> 1600 single viruses without observed progeny, the authors discovered an unprecedented amount of cross-border virus traffic. Genetic patterns reflect human movement patterns as the number of viruses entering the UK increased and then declined after the international journey in March.

Enclosed image

Several launches of SARS-CoV-2 from travelers from Italy, Spain and France, but not Asia, triggered the epidemic between January and June 2020 in the United Kingdom.

PHOTO: TOBY MELVILLE / REUTERS

The UK is not the only country whose early focus on Asia as the pandemic epicenter has allowed viruses to penetrate from European sources. Genetic data have also traced the origin of epidemics in Brazil (4), Boston (5), and the city of New York (6) back to Europe. Travel restrictions can be very effective when strictly enforced, but these studies collectively highlight how easily SARS-CoV-2 infection can develop during even minor borderline decays, including the repatriation of Americans from Asia at the onset of the pandemic (7).

There is no magic to triangularity, scalability and statistical accuracy, as genomic data exceeds the capacity of existing platforms. Du Plessis et al. faced the methodological challenges encountered in previous evolutionary analyzes of SARS-CoV-2 (8), in this case enlarged by a significantly larger data set. These challenges include a low phylogenetic signal among genetically similar viruses, which exceeds the capacity of standard phylogenetic software, as well as biases that emerge when other countries sequence different numbers of viruses relative to the number of national cases. The authors follow a new approach that uses genetic data to deduce the timing and number of virus introductions, but uses epidemiological metadata to deduce the country of origin. Better integration of genomic and epidemiological data will continue to improve outbreak responses, but it can be cumbersome without data access repositories – for example for varying global air travel volumes. Epidemiologists are increasingly turning to digital and cellular data from crowds to track human movements and social contact patterns (9).

Contact detection was effective in controlling early COVID-19 outbreaks, such as the first outbreak in Europe in Munich, Germany (10), and the provision of important insights into the transmission of the community and the role of super-distribution11). But contact detection is difficult and is often abandoned as epidemics grow. Genetic data can add a new dimension to these efforts by effectively determining whether two cases belong to the same transmission line, despite the gaps in sampling among individuals in the chain. Du Plessis et al. did not investigate heterogeneities in transmission at city level (5), but their observations show the growth and size-dependent extinction of hundreds of joint circles, as the national epidemic has been brought under control by non-pharmaceutical interventions (NPIs).

The study of du Plessis et al. made use of a fraction of the British series generated so far. The risk of emerging variants increases as SARS-CoV-2 populations increase worldwide, and this flushes out in immunocompromised, chronically infected or even non-human hosts where they experience different selection pressures. As SARS-CoV-2 becomes more evolutionarily dynamic, the UK’s approved data provides a source for the global community. Denmark, Australia and other countries also have intensive SARS-CoV-2 sequencing operations. But the UK is currently the only country with more than 1 million COVID-19 cases that has more than 1% of the SARS-CoV-2 genome sequence (the UK series ~ 5%).

The most disturbing evolutionary questions require broad population-level analyzes based on continuous representative national sampling, with randomized selection of viruses to follow (12). A centrally coordinated sampling strategy is a very beneficial feature of the UK virological monitoring program, even if it is less quantifiable than velocity or volume (2). The United States generated the second largest number of SARS-CoV-2 genomes, but the proportion of cases followed varied widely between cities and states due to differences in resources. Large-scale studies become methodologically challenging when data sets are compiled from smaller studies that were originally designed to address other research questions, leading to biases. Sometimes it was difficult to judge interesting hypotheses, such as whether SARS-CoV-2 containing the peak protein D614G mutation has spread worldwide due to fitness benefits or random chance (13).

Variants that originate in one country quickly become a threat to the neighbors. Countries need to compensate for each other’s virological monitoring efforts in a rapidly changing global viral landscape. The UK ARTIC network actively shares resources and protocols for SARS-CoV-2 sequencing. NextStrain provides a user-friendly visual platform to track SARS-CoV-2 evolution in near real-time. Numerous open-access bioinformatics tools have been developed to analyze SARS-CoV-2 series (14). But a lesson from the UK is the importance of sustained government investment in scalable national infrastructure. Careless academic researchers can build popular instruments, but struggle to scale as the amount of genomic data explodes. Global coordination will also be helpful, including the universal acceptance of a single designation for SARS-CoV-2 progeny.

The COVID-19 pandemic has galvanized investments in promising research areas at the frontiers of technology and big data. Over the past two decades, faster, cheaper, and more portable sequencing technologies and flexible bioinformatics platforms have laid the foundation for real-time genomic epidemiology. Ongoing jumps are usually fueled by public health crises, including outbreaks of flu, Ebola and Zika (15). The COVID-19 jump has begun.

Acknowledgments: The content does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does it imply endorsement by the U.S. Government.

Source