In This Section

First Arcus Omics Data Launch: Q&A With Ingo Helbig, MD

Published on February 15, 2023 in Cornerstone Blog · Last updated 7 months 1 week ago


Subscribe to be notified of changes or updates to this page.

3 + 0 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.

Children’s Hospital of Philadelphia Research Institute and Arcus, CHOP’s centralized research data repository, launched its first genomic dataset that includes more than 5,000 exomes/genomes and 12,000 chromosomal SNP arrays. The availability of this data will allow CHOP investigators to perform large-scale research while protecting patients’ privacy. In this Q&A, Ingo Helbig, MD, scientific director of the Arcus Omics Team, explains more about the importance of this Arcus Omics data release. CHOP researchers can reach out to Arcus Omics at and drop into Arcus Omics Office hours.

Ingo Helbig, MD
Ingo Helbig, MD, scientific director of the Arcus Omics Team, discusses the first omics dataset launch.

The Arcus Omics’ first data launch released 12,000 SNP arrays, 4,000 exomes, and 1,300 genomes to the CHOP research community in January 2023. What makes this unique from other datasets?

Easy, hassle-free access to data is important for researchers to complete their work. This is particularly relevant in the genomics sphere where there is strength in numbers; however, genomic data is so sensitive that sharing it is not straightforward. To overcome this challenge, Arcus built a system within CHOP to make our institutional data available for researchers while providing a state-of-the-art privacy framework. Think of this as open-access data within CHOP that is firewalled to the outside and fully managed by the Research Institute, enabling researchers to access a larger, combined omics dataset to jumpstart their research.

Where did this collection of omics data originate, and why is it important that Arcus has harmonized this data to be consistent across patients?

Thanks to the collaboration of many groups across CHOP — including the Division of Genomic Diagnostics, Center for Data-Driven Discovery in Biomedicine, Birth Defects Biorepository, Neuroscience Center, and the Roberts Individualized Medical Genetics Center — we have data available under one umbrella. When genomic data comes from various resources, it is critical to make the data mutually compatible — this is the role of the Arcus Omics Team.

In what other ways have CHOP bioinformatics experts optimized this data?

A unique aspect of Arcus is that it can act as the honest broker on behalf of our researchers to collect and provide access to de-identified data, particularly the continuous update of rich clinical data that can be linked to genomic data. Working on datasets in this way has been Arcus’ strength in the more than 100 data science projects that have run through the Arcus platform. We can now use this expertise in combination with genomic data.

How are privacy concerns addressed as it pertains to omics data collection? Do omics projects have Institutional Review Board oversight?

There is a frequent misunderstanding that Arcus only works with de-identified data where research can be done under the Arcus Master IRB; however, this is not completely accurate. Arcus Omics is for all institutional omics data, some of which has been generated in IRB-approved studies. Other datasets, especially our datasets from clinical diagnostic testing, can only be accessed without identifiers. For most omics projects, investigators will not need an IRB, but this is a topic that we jointly explore with the investigators when we start a project. Regardless of the situation, the Arcus Omics team is able to help. Arcus also has its own privacy team that watches our data like a hawk. Data privacy is always on the top-of our-minds, and we have the honor to work with one of the best privacy teams in the region.

How can researchers access this data, and will there be training and education resources available?

Each CHOP investigator and project is assigned a separate virtual computing environment called an Arcus lab, which is loaded with all the data and software needed for the project. Training in genomics as well as other educational resources will be available within the Arcus lab dashboard. Each project will also be assigned a bioinformatics scientist from the Arcus Omics team to provide technical support and scientific collaboration.

Researchers come to us with various levels of expertise and questions. This may range from questions from investigators without a genomics background (e.g., do you see variants in my gene of interest) to heavy data compilation and analysis for complex bioinformatics projects. In the first case, we would typically perform this analysis for the investigator, and in the second case, we would help them set up their virtual lab and assist with building and maintaining pipelines as well as streamlining their workflows.

What is on the horizon for Arcus Omics, and how will it continue to grow?

Data, data, and more data is on the horizon. We have only just started, as Arcus Omics is only six months old. We started with five pilot projects, which increased to 10 in recent weeks. These projects span across the cohort and were carefully curated to represent a wide range of both investigator areas of expertise and research aims. The major advantage of Arcus Omics is that it sets the stage for investigator equity in terms of access to clinical and genomic data.

The Arcus Omics platform will be the database for all future omics data generated through institutional resources. It will link up with our Institutional Biobank and eventually with the Penn Medicine Biobank, and we will heavily invest in supporting and training investigators. We will also expand into the other omics spheres, including RNA sequencing, proteomics, and bisulfite sequencing.

What excites you the most about the first data launch?

That we actually made it! If you had asked us two years ago whether this was possible, we would have wholeheartedly said, “No.” Combining data from across the enterprise sounds simple in theory, but it is a herculean task. Having an easily-accessible, institutional omics dataset is new for CHOP, and we are extremely excited to have laid the foundation for this resource. This dataset will expand rapidly, and we expect that it will enrich CHOP’s omics landscape in a broad fashion. This is omics data that every clinician and researcher at CHOP can access. We have created a level playing field and are excited to see how these resources will integrate with other omics projects at our institution.