Likely selection for D614G S

There’s some foolishness going around about this recent preprint from Los Alamos. They’ve spotted several mutations in collected SARS2 sequences from around the world that look as though they may be under positive selection. We’re going to look into one in particular, a D614G mutation in the spike protein. This mutation looks as though it’s increasing in relative prevalence in every region with decent sampling through March and early April.

As for the foolishness, there’s people responding to this by arguing that this pattern is more likely to be caused by bottlenecking or drift than selection. Here’s an epidemiology prof at Harvard saying this. Here’s a group lead at EMBL in the UK. Here’s a Cambridge virologist. The argument here is that the impact is random: Italy happened to get this particular strain and then spread it across Europe.

Let’s dive into some specific regions in more detail. The Harvard guy claims that Washington State data is consistent with a new seeding of the new strain from Europe or New York followed by roughly equivalent suppression of the old and new outbreaks by various measures. I don’t have access to the raw GISAID database at the moment, so we’ll use the visualization provided by nextstrain.org. Pull it up and set to north-america and color by the genotype at S614 to look at these two strains.

Then scroll down to the bottom and click on “Washington” in the “Filter by Admin Division” section. That’ll show you only the (496 as of this writing) samples collected from Washington state.

Then scroll up and look at the tree. The color tells you which strain you’re looking at. Blue is the old strain, yellow is the new. The very first dot was from January 1st, with the original aspartic acid. Then no sequenced cases for a month and a half. From Feb 18- Mar 9, there are 148 new cases in Washington, all old strain, and all but two descended from the original January case. So through early March we’re mostly seeing an epidemic growing from only a few initial introductions.

The first case with a G at this locus is March 10, there’s two of them, and they’re fairly diverged at other loci, suggesting multiple introductions of this new strain into Washington State, probably from the East Coast or Europe. Let’s look at weekly cases in Washington after those first introductions. You can get these by downloading the data at the bottom of the page. The new strain is clade A2a, old strain is everything else.

Washington	Total cases	Old Strain	New Strain	Fraction New
March 10-16	77	55	22	29%
March 17-23	54	26	28	52%
March 24-30	79	33	46	58%
March 31-April 6	180	89	91	51%
April 7-13	16	0	16	100%

So this data agrees with the thesis presented in the article: from early March to early April, the fraction of Washington cases with the new strain rises from 0% to 100%. Now, the 100% is probably an overestimate. In particular, they’re all gathered by the UW Virology lab, the importance of which I’ll get to in a minute. However, they’re divergent, not all from a single cluster. But it certainly looks as though the new strain is becoming dominant in Washington State.

Now, most of these samples come from either the UW Virology lab or the Washington State Department of Health. The state gathered samples also have county information. Let’s dig into that a little bit. The two counties I’m going to focus on are King county and Yakima county. King County contains Seattle, while Yakima County is much smaller and on the other side of the mountains. The hope is that we can pick up separate trends due to the geographical separation.

First King County. Remember that this doesn’t include the UW samples, so we get a much smaller count.

King County	Total cases	Old Strain	New Strain	Fraction New
March 10-16	8	8	0	0%
March 17-23	3	2	1	33%
March 24-30	7	3	4	57%
March 31-April 6	15	3	12	80%

Then Yakima County. Here the state didn’t start testing until late March.

Yakima County	Total cases	Old Strain	New Strain	Fraction New
March 10-16	0
March 17-23	0
March 24-30	20	17	3	15%
March 31-April 6	63	50	13	21%

So it looks like we’re seeing what I expected to see: later introduction of the new strain into Yakima county data, followed by growth there.

We can do a similar kind of look at the California samples, though there aren’t nearly as many to work with (mostly San Francisco, from UCSF, with some other scattered samples). There we see the first of the new strain show up on March 7th in Sacramento County. From then:

California	Total cases	Old Strain	New Strain	Fraction New
March 7-13	7	3	4	57%
March 14-20	10	3	7	70%
March 21-27	27	15	12	44%
March 28-April 4	42	24	18	43%
April 5-	17	7	10	59%

Here it’s more ambiguous and we don’t have as much data, but a month after the new strain is first seen, it’s quite firmly established in California as well.

As far as the foolishness goes: it’s entirely possible to have a particular clade or strain take over in a single location due to chance: person X spreads it there, person X happens to have that strain, no other strain gets introduced for a while so that strain becomes predominant. But in both Washington and California we see new introduction of D614G in early March when there are already spreading epidemics, followed by rapid growth such that a month later ~half of all cases originate from the later introduction. I don’t yet have access to the full GISAID dataset, but the authors state that the same thing is happening in England, Germany, Japan, and Australia. The same thing happening multiple places is not random chance, it’s something else. As for the functional speculations about higher viral load, they’re suggestive but not dispositive. We ought to look at them. And as for the fools at various institutions, they should – but won’t – shut up when they don’t know what they’re talking about.

Join the Conversation

8 Comments

gothamette says:

May 4, 2020 at 2:50 pm

It’s now official that 20% of NYC is poz:
https://www.governor.ny.gov/news/amid-ongoing-covid-19-pandemic-governor-cuomo-announces-results-completed-antibody-testing

But that can’t be evenly distributed among the boros. Bronx and Queens were especially hard hit – and only in certain zips. Manhattan – not so bad. I cannot believe that Manhattan is 20% seropositive.

NYC is very financially and socially segregated.

LikeLike

1. arguably wrong says:
  
  May 4, 2020 at 5:09 pm
  
  I just got access to the raw GISAID database of SARS2 genome sequences. Took a quick look and there’s quite a lot of high quality data from NYU. No zip code data, but borough and county. I can’t share the data, but let me know if there’s something you think might be worth looking at.
  
  LikeLike
  
  1. gothamette says:
    
    May 4, 2020 at 6:51 pm
    
    I’m not capable of dealing with the finer points of genetic variation – or even the rougher points. Although I will screw up the courage to ask a question later when I’ve looked at the sequences again – I had an interesting time comparing the various boros earlier today.
    
    I do have a general question, which I posed on West Hunt, but no one saw it. It’s this. Actually two questions.
    
    It’s official: NYC is 20% sero-positive. Yet deaths have been very unequally distributed – take a look at the zip code map in the link I provided. So is it reasonable to conclude that Manhattan has less than 20% infected, and Bronx and Queens have 20%+?
    
    Second question. Many wealthy Manhattanites have left town for the duration. They’ll eventually return, probably in September. See above: they left a place that wasn’t terribly affected. When they come back, it will be flu season. We will still have the virus. Social distancing will be spotty (although the crap weather will help).
    
    This is when Manhattan gets hit.
    
    Makes sense?
    
    LikeLike
    
2. gothamette says:
  
  May 4, 2020 at 9:17 pm
  
  NYC:
  https://www1.nyc.gov/site/doh/covid/covid-19-data.page
  
  Manhattan demographics: 50% non-H White and 10% Asian. That surprised even me. The demographics are dissimilar to the rest of NYC. And the whites are a wealthy lot, on the whole. (Explains a lot.)
  
  https://worldpopulationreview.com/boroughs/manhattan-population/
  
  LikeLike
  
James Bowery (@jabowery) says:

May 4, 2020 at 6:52 pm

Cases Per Million Population, Density/mi^2
Yakima: 156, 58
King: 7, 1034

Yakima has a large population of Yakima indians.
There is evidence the Navajo also suffered disproportionately.

I’ll try generating a by-countyanimation limited to Washington, like this nation-wide one.

LikeLike

1. James Bowery (@jabowery) says:
  
  May 4, 2020 at 8:34 pm
  
  Here’s the Washington State only counties animation of of per million confirmed cases vs population density over time.
  
  LikeLike
  
  1. gothamette says:
    
    May 4, 2020 at 9:15 pm
    
    I heard about the Navajo. What about the Hopi?
    
    LikeLike
    
  2. arguably wrong says:
    
    May 4, 2020 at 9:35 pm
    
    Very interesting, thanks. Confirms my suspicion that east of the mountains picked up the infection later than the west.
    
    LikeLike

Likely selection for D614G S

Join the Conversation

Leave a comment

Cancel reply

Share this:

Related

Join the Conversation

Leave a comment

Cancel reply