In This Section

Can a New Computational Tool Discover and Quantify RNA Molecules More Accurately?

Published on March 23, 2023 in Cornerstone Blog · Last updated 3 months 1 week ago

AddtoAny

Researchers in the Center for Computational and Genomic Medicine developed a tool to read RNA more accurately.

The findings:

Children's Hospital of Philadelphia researchers developed a new, open-source computational tool, Error Statistics Promoted Evaluator of Splice Site Options (ESPRESSO), that allows for more accurate discovery and quantification of ribonucleic acid (RNA) molecules from long-read RNA sequencing data.

Why it matters:

The transition from short-read to long-read RNA sequencing represents an exciting technological transformation, and computational tools that reliably interpret long-read RNA sequencing data are urgently needed.

RNA acts as a messenger carrying instructions for controlling the synthesis of proteins. RNA isoforms and their underlying RNA processing events can be dysregulated when disease is present.

RNA molecules are historically difficult to read in their entirety because they usually comprise thousands of bases. "Short-read" RNA sequencing breaks up RNA molecules into pieces that contain 200 to 600 bases and then computer programs are used to reconstruct the full RNA sequences. Although short-read RNA sequencing provides highly accurate readout of RNA bases, the information is limited due to the short length of the sequences.

"Long-read" platforms that can sequence an RNA molecule more than 10,000 bases in length have become available, but they are known to have a higher per-base error rate, which had limited the widespread use of long-read RNA sequencing. Researchers developed ESPRESSO to overcome this long-standing challenge for long-read RNA sequencing.

Who conducted the study:

Yi Xing, PhD

Yi Xing, PhD, director of the Center for Computational and Genomic Medicine (CCGM) and executive director of the Department of Biomedical and Health Informatics at Children's Hospital of Philadelphia, was the senior author of the study, and Yuan Gao, a former postdoctoral fellow in Dr. Xing's lab, was the first author. Additional authors included colleagues from the CCGM, the Genomics and Computational Biology Graduate Group at the University of Pennsylvania, and the Raymond G. Perelman Center for Cellular and Molecular Therapeutics at CHOP.

How they did it:

ESPRESSO compares long RNA sequencing reads of a gene to its corresponding genomic DNA. The tool uses the error patterns of the reads to identify splice junctions — places where the RNA molecule has been cut and joined — and their corresponding full-length RNA isoforms. Dr. Xing and colleagues evaluated the performance of ESPRESSO by using simulated data and data on real biological samples. They found that ESPRESSO performs better than multiple available tools in terms of discovering RNA isoforms and quantifying them. They generated and analyzed more than 1 billion long RNA sequencing reads covering 30 tissue types and three human cell lines.

Quick thoughts:

"Long-read RNA sequencing is a powerful technology that will allow us to uncover RNA variation in rare genetic diseases and other conditions like cancer," said Dr. Xing, who is also a professor in the Department of Pathology and Laboratory Medicine at Penn. "We envision that ESPRESSO will be a useful tool for researchers to explore the RNA repertoire of cells in various biomedical and clinical settings."

Where the study was published:

The study appeared in Science Advances. Learn more in the CHOP press release.

Disclosure statement:

National Institutes of Health grants (R01GM088342, U01CA233074, R01GM121827, and R56HG012310) and a training grant in computational genomics (T32HG000046) supported this research. Dr. Xing is a scientific cofounder of Panorama Medicine.