Newswise – DALLAS – February 8, 2021 – World health experts have long suspected that the prevalence of COVID-19 is higher than reported. A machine learning algorithm developed at UT Southwestern estimates that the number of COVID-19 cases in the U.S. since the pandemic began is nearly three times as high as in confirmed cases.
The algorithm, described in a study published today in PLOS ONE, provides daily updated estimates of total infections to date, as well as how many people are currently infected in the US and in 50 countries hardest hit by the pandemic.
According to model calculations, more than 71 million people in the U.S. – 21.5 percent of Americans – contracted COVID-19 on February 4th. This compares with the significantly smaller number of confirmed cases that 26.7 million reported in public, says Jungsik Noh, Ph.D., a UT southwest assistant professor in the Lyda Hill Division of Bioinformatics and first author of the study.
Of the 71 million Americans who had an estimated COVID-19, 7 million (2.1 percent of the U.S. population) had current infections and were contagious on February 4, according to the algorithm.
Noh’s written study is based on calculations completed in September. At that time, according to them, the number of actual cumulative cases in 25 of the 50 countries hit hardest was five to 20 times greater than the confirmed case numbers then suggested.
If we look at the current information available on the online algorithm, the estimates are now closer to the reported numbers – but still much higher. On February 4, Brazil had more than 36 million cumulative cases as estimated by the algorithm, almost four times more than the 9.4 million confirmed cases. France had 14 million compared to the 3.2 million reported. And the UK had almost 25 million instead of about 4 million – more than six times as many. Mexico, an outlier, has reported nearly 15 times its number of cases – 27.6 million rather than 1.9 million confirmed cases.
“Estimates of actual infections show for the first time the true severity of COVID-19 in the US and in countries worldwide,” says Noh.
The algorithm uses the number of reported deaths – which is more accurate and complete than the number of cases confirmed by the laboratory – as the basis for calculating it. It then assumes an infection mortality rate of 0.66 percent, based on an earlier study of the pandemic in China, and considers other factors such as the average number of days from the onset of symptoms to death or recovery. It also compares its estimate with the number of confirmed cases to calculate a ratio of confirmed to estimated infections.
Many are still uncertain about COVID-19 – especially the mortality rate – and the estimates are therefore rough, says Noh. But he believes the model’s estimates are more accurate and leave fewer cases than the confirmed one currently used as a guideline for public health policy. A more comprehensive estimate of the incidence of the disease is important, Noh adds.
‘These are critical statistics on the severity of COVID-19 in each region. If we know the seriousness of different regions, it can effectively fight us against the spread of viruses, ”he explains. “The current infected population is the cause of future infections and deaths. Its actual size in a region is an important variable needed to determine the severity of COVID-19 and to build strategies against regional outbreaks. ”
In the US, infection rates vary widely by state. California has had nearly 7 million infections since the onset of the pandemic, compared to New York’s 5.7 million, according to the algorithm’s February 4 forecast. According to the model, California had 1.3 million active cases on that date, affecting 3.4 percent of the state’s population. .
Other model estimates for February 4: In Pennsylvania, 11.2 percent of the population had current infections – the highest percentage of any state, compared to a low of 0.15 percent of Minnesota residents; in New York, an early hotspot, 528,000 people had active infections, or about 2.7 percent of the population. Meanwhile, 2.3 percent in Texas had current infections.
Noh says he developed the algorithm last summer while trying to send his sixth-grade daughter back to school in person. There is nowhere to find the data he needed to determine its safety, he says.
After building the machine algorithm, he discovered that the area where he lived had about 1 percent infection rate. So his daughter went to school.
Noh checked his findings by comparing his results with existing prevalence figures found in several studies that used blood tests to see if there were antibodies against the SARS-CoV-2 virus, which causes COVID-19. In most of the areas tested, his algorithm’s infections matched the percentage of people who tested positive for the antibodies. PLOS ONE study.
The online model uses COVID-19 mortality data from Johns Hopkins University and The COVID Tracking Project, a volunteer organization set up to track COVID-19, to perform its daily updates. The estimates given in the PLOS ONE study date from 3 September. At that time, about 10 percent of the U.S. population was infected with COVID-19, based on Noh’s algorithm.
Gaudenz Danuser, Ph.D., chair of the Lyda Hill Department of Bioinformatics and professor of cell biology, was senior author of the study. He also holds the Patrick E. Haggerty Chair in Basic Biomedical Science.
Funding comes from Lyda Hill Philanthropies.