In This Section

Research Administration
Roberts Center for Pediatric Research
15th Floor
2716 South Street
Philadelphia, PA 19146

Novel Tool Helps Track the Evolution of SARS-CoV-2: Q&A with the Planet Lab

Published on September 14, 2020 in Cornerstone Blog · Last updated 8 months 3 weeks ago


Subscribe to be notified of changes or updates to this page.

16 + 3 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.

Researchers in the Planet Lab are naming and tracking new mutants of SARS CoV-2, the virus that causes COVID-19.

Though we often think of SARS-CoV-2, the virus that causes COVID-19, as a single and uniform entity, scientists are continuously learning about the virus’ multiple mutations and how they evolve and emerge over time. At Children’s Hospital of Philadelphia, researchers in the lab of Paul Planet, MD, PhD, are taking a magnified look at these strains with the help of a unique tool that categorizes and tracks the divergent — and often competitive — journeys of SARS CoV-2 types.


Paul Planet, MD, PhD

Dr. Planet, an attending physician in the Division of Infectious Diseases, and Ahmed Moustafa, PhD, a postdoctoral fellow in the Division of Infectious Diseases, created the method, called GNU-based Virus Identification (GNUVID). In this Q&A, we sat down with Dr. Planet and Dr. Moustafa to learn more about how GNUVID is yielding fascinating insights into SARS-CoV-2 — the likes of which may soon help in the fight against COVID-19.


Ahmed Moustafa, PhD

What is your lab’s area of expertise at CHOP?

The Planet Lab studies microbial evolution, mostly bacteria, using whole genomes. We use a combination of bioinformatic techniques and basic research to understand how bacteria evolve to adapt to their host but also change to become more resistant to antibiotics, occupy new infectious sites, and things like that.

Tell us a little bit about GNUVID and how you’re using it to study COVID-19.

When SARS-CoV-2 went viral and quickly spread around the world, it mutated and gave off novel lineages that can be tracked back to their source. Each of these sequences are slightly different mutations of the virus. At the same time, the sequencing community reacted to this by starting to sequence tons and tons of viruses. By now, there are around 94,000 different virus genomes that have been sequenced and deposited on a web site called GISAID. So, there’s a lot of information that’s very difficult to put together. We really needed tools to try to understand which viruses were coming from where, and how to deal with all of the diversity as the virus mutates and spreads around the world.

We had already been working on a whole genome approach that helps researchers understand what’s new and novel about a particular genome when they sequence it. And it struck us as COVID-19 came around, that we could modify this tool, WhatsGNU, to try to understand the SARS-CoV-2 virus as it changed and mutated.

How does GNUVID work?

Essentially, the cornerstone of evolutionary biology is naming things so that you can understand where they're coming from. That's why taxonomy is so important in evolution. So, what we designed was a way to taxonomically name new mutants of this virus as they arise, and then to track those pathogens back to their closest relatives and to others circulating around the world.

GNUVID is an automatic tool that can take all of the data that is currently available and systematically name different viruses in the data bank. That’s really powerful because it allows us to say where each virus is coming from, where was it seen last, and perhaps in the future we'll be able to tell which person it came from as the virus mutates. It's kind of like the ultimate contact tracing because you can actually distinguish different viruses really rapidly to see where they were transmitted from.

What kind of insights have you yielded with GNUVID so far?

When we used GNUVID, we saw some really interesting patterns, especially when we cluster sequences based on how similar they were to each other, known as clonal groups. For instance, earlier in the pandemic, we saw that there seemed to be two different lineages that came into the United States, which has been noticed by some other groups as well: One lineage came in to the East Coast from Europe and one came in to the West Coast, probably from Asia.

We also noticed that there seemed to have been a lot of diversification of the virus in Europe. You have different clonal groups originating from clusters that appear to have originated in Europe. Some of them even seem to have made their way back to China.

Having named all these different types, we could also look at a specific place, say, Washington State, and track when a specific sequence type emerges and when it goes away. And that's when we started to notice some really novel patterns; I think people haven't been looking at the viruses over time in this very granular way.

We've seen from the standard epidemiological curves where the virus emerges and then goes away. But what we saw in Washington State was that one sequence type emerged and then actually appears to have been outcompeted by another sequence type. That second sequence type appears to have come from New York and prior to that likely from Europe. So, instead of a pattern that superficially looked like one wave, it is actually two. This kind of pattern seems like one virus may have outcompeted the other.

You mentioned that GNUVID could perhaps be used for contact tracing. How might his work?

At first, there wasn’t a lot of hope that we could actually tell where different strains of the virus were coming from, perhaps because there wasn’t enough change in its genome to actually be a different virus. Now, with lots of different circulating sequence types, we could — and this is going to have to be borne out in a study — potentially differentiate locally different sequence types and either confirm or deny a particular hypothesized transmission event.

So for instance, we're currently working on a grant that will look at COVID-19-positive mothers and their babies. If the baby is positive, you would be able to sequence the whole genome of the baby's virus and then sequence the whole genome of the mother’s virus to see if they're the same sequence type or very close to the same. If they are, you could say, this is probably a transmission event from mother to baby. But if they're very different, then you would have to suggest that the baby's virus came from somewhere other than the mother. That's the kind of way that you can imagine this tool might help.

Awesome! This research is obviously evolving and changing as new data comes out. Can you give us an update about the most recent data?

We keep updating the GNUVID database with new genomes as they arrive, and they are submitted into GISAID. As of Aug. 17, 32,719 genomes were categorized using GNUVID. And we know the different clusters that they group in. Interestingly, GNUVID’s last estimate of the number of introductions of distinct viruses to the US is at least 36 between Dec. 30, 2019, and May 30, 2020. This is much greater than the two that were previously hypothesized, and it is also probably a conservative estimate. Even through the lockdowns, the virus has continued to circulate globally.