There’s some foolishness going around about this recent preprint from Los Alamos. They’ve spotted several mutations in collected SARS2 sequences from around the world that look as though they may be under positive selection. We’re going to look into one in particular, a D614G mutation in the spike protein. This mutation looks as though it’s increasing in relative prevalence in every region with decent sampling through March and early April.
As for the foolishness, there’s people responding to this by arguing that this pattern is more likely to be caused by bottlenecking or drift than selection. Here’s an epidemiology prof at Harvard saying this. Here’s a group lead at EMBL in the UK. Here’s a Cambridge virologist. The argument here is that the impact is random: Italy happened to get this particular strain and then spread it across Europe.
Let’s dive into some specific regions in more detail. The Harvard guy claims that Washington State data is consistent with a new seeding of the new strain from Europe or New York followed by roughly equivalent suppression of the old and new outbreaks by various measures. I don’t have access to the raw GISAID database at the moment, so we’ll use the visualization provided by nextstrain.org. Pull it up and set to north-america and color by the genotype at S614 to look at these two strains.
Then scroll down to the bottom and click on “Washington” in the “Filter by Admin Division” section. That’ll show you only the (496 as of this writing) samples collected from Washington state.
Then scroll up and look at the tree. The color tells you which strain you’re looking at. Blue is the old strain, yellow is the new. The very first dot was from January 1st, with the original aspartic acid. Then no sequenced cases for a month and a half. From Feb 18- Mar 9, there are 148 new cases in Washington, all old strain, and all but two descended from the original January case. So through early March we’re mostly seeing an epidemic growing from only a few initial introductions.
The first case with a G at this locus is March 10, there’s two of them, and they’re fairly diverged at other loci, suggesting multiple introductions of this new strain into Washington State, probably from the East Coast or Europe. Let’s look at weekly cases in Washington after those first introductions. You can get these by downloading the data at the bottom of the page. The new strain is clade A2a, old strain is everything else.
|Washington||Total cases||Old Strain||New Strain||Fraction New|
|March 31-April 6||180||89||91||51%|
So this data agrees with the thesis presented in the article: from early March to early April, the fraction of Washington cases with the new strain rises from 0% to 100%. Now, the 100% is probably an overestimate. In particular, they’re all gathered by the UW Virology lab, the importance of which I’ll get to in a minute. However, they’re divergent, not all from a single cluster. But it certainly looks as though the new strain is becoming dominant in Washington State.
Now, most of these samples come from either the UW Virology lab or the Washington State Department of Health. The state gathered samples also have county information. Let’s dig into that a little bit. The two counties I’m going to focus on are King county and Yakima county. King County contains Seattle, while Yakima County is much smaller and on the other side of the mountains. The hope is that we can pick up separate trends due to the geographical separation.
First King County. Remember that this doesn’t include the UW samples, so we get a much smaller count.
|King County||Total cases||Old Strain||New Strain||Fraction New|
|March 31-April 6||15||3||12||80%|
Then Yakima County. Here the state didn’t start testing until late March.
|Yakima County||Total cases||Old Strain||New Strain||Fraction New|
|March 31-April 6||63||50||13||21%|
So it looks like we’re seeing what I expected to see: later introduction of the new strain into Yakima county data, followed by growth there.
We can do a similar kind of look at the California samples, though there aren’t nearly as many to work with (mostly San Francisco, from UCSF, with some other scattered samples). There we see the first of the new strain show up on March 7th in Sacramento County. From then:
|California||Total cases||Old Strain||New Strain||Fraction New|
|March 28-April 4||42||24||18||43%|
Here it’s more ambiguous and we don’t have as much data, but a month after the new strain is first seen, it’s quite firmly established in California as well.
As far as the foolishness goes: it’s entirely possible to have a particular clade or strain take over in a single location due to chance: person X spreads it there, person X happens to have that strain, no other strain gets introduced for a while so that strain becomes predominant. But in both Washington and California we see new introduction of D614G in early March when there are already spreading epidemics, followed by rapid growth such that a month later ~half of all cases originate from the later introduction. I don’t yet have access to the full GISAID dataset, but the authors state that the same thing is happening in England, Germany, Japan, and Australia. The same thing happening multiple places is not random chance, it’s something else. As for the functional speculations about higher viral load, they’re suggestive but not dispositive. We ought to look at them. And as for the fools at various institutions, they should – but won’t – shut up when they don’t know what they’re talking about.