Nvidia and Harvard develop AI tools that accelerate genome analysis

Join Transform 2021 for key AI and enterprise data themes. Learn more.


Researchers at Nvidia and Harvard have outlined AtacWorks, a machine learning toolkit, to reduce the cost and time required for rare and single-cell experiments. In a study published in the journal Nature communication, the co-authors showed that AtacWorks can perform analyzes on an entire genome in just half an hour compared to the multiple hours that traditional methods take.

Most cells in the body carry a complete copy of a person’s DNA, with billions of base pairs in the nucleus. But an individual cell removes only the subset of genetic components needed to function, with cell types such as liver, blood or skin cells using different genes. The regions of DNA that determine the function of a cell are more or less easily accessible, while the rest are protected around proteins.

AtacWorks, available at Nvidia’s NGC Center of GPU Optimized Software, is working with ATAC-seq, a method of finding open areas in the genome in cells developed by Harvard professor Jason Buenrostro, one of authors of the newspaper, were prepared. ATAC-seq measures the intensity of a signal at each location in the genome. Peaks in the signal correspond to regions with DNA, so the fewer cells available, the more noise the data prevents, making it difficult to identify which parts of the DNA are accessible.

ATAC-seq typically requires thousands of cells to get a clean signal. According to the co-authors, the application of AtacWorks produces the same quality results with only dozens of cells.

AtacWorks is trained on branded pairs of matching ATAC-seq datasets, one of high quality and one that is noisy. The model learned a sampled copy of the data to predict an accurate, high-quality version and identify peaks in the signal. Using AtacWorks, the researchers found that they could see accessible chromatin, a complex of DNA and proteins, whose primary function is to pack long molecules into more compact structures, in a noisy order of 1 million reads, almost like traditional methods with a clean data set. of 50 million read.

With AtacWorks, scientists can do research with a smaller number of cells, which reduces the cost of collecting and sequencing samples. Analysis can also become faster and cheaper. AtacWorks, which runs on Nvidia Tensor Core GPUs, took less than 30 minutes to derive a genome, a process that would take 15 hours on a 32 CPU core system.

In the Nature communication The Harvard researchers applied AtacWorks to a collection of stem cells that produce red and white blood cells – rare subtypes that could not be studied using traditional methods. With a sample set of only 50 cells, the team was able to use AtacWorks to identify different regions of DNA associated with cells developing in white blood cells, and separate sequences correlating with red blood cells.

“With very rare cell types, it is not possible to study the differences in their DNA using existing methods,” said Avantika Lal, a researcher at Nvidia, the first author of the article. “AtacWorks can help not only reduce the cost of collecting chromatin accessibility data, but also offer new possibilities in drug discovery and diagnosis.”

VentureBeat

VentureBeat’s mission is to be a digital city square for tech makers to gain knowledge about transforming technology and transactions. Our website provides essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community and gain access to:

  • updated information on the topics that interest you
  • our newsletters
  • thought leader content and discounts on access to our valued events, such as Transform
  • network features, and more

Become a member

Source