Possibilities of interest in gene-editing for increased intelligence

There’s been some discussion around Charles Murray’s recent claim that a lot of people would probably try to make their kids smarter with gene editing if it were reasonably cheap. There’s been some foolish points, some reasonable ones, but not many with much imagination. So I thought I’d point out some possibilities. Hat tip to this old discussion between Jerry Pournelle and Greg Cochran for inspiration and some ideas.

Suppose, as a basis for discussion, someone somewhere tries to edit genes of gametes or embryos to improve the child’s eventual intelligence. What might happen?

  • It’s hard to change lots of locations, so edits get done on the highest-effect loci. One or more of them turn out to have heterozygote advantage but nasty recessive effects. People would probably be smart enough not to give the kids torsion dystonia or Tay-Sachs, but we might see brand new exciting Mendelian disorders, and kids with six different ones simultaneously.
  • Off-target effects from the gene editing are reasonably common. Embyro screening catches most of the problematic ones, but not large chromosomal rearrangements that leave things intact. Balanced translocations where the genes are all there, but you might have a chromosome arm hop to another chromosome. The edited kids are fine. Their children all have things like inherited Downs or Kleinfelters.
    • Rapid adoption of the technology means most children in a generation have been gene edited. 25 years in, we find out almost all of the Changed are infertile or have extremely disabled children. Lots of possibilities: population crashes, mass adoption of Unchanged children, etc. Maybe one company was smart enough to screen for structural variants and that subpopulation takes over the world.
  • We weren’t quite looking for what we thought we were. We sort of get increased intelligence, but with something else riding along.
    • We use polygenic risk scores for educational attainment for editing, end up getting kids with higher conscientiousness and conformism, losing our population of irascible old assholes.
    • We use childhood IQ, get much more rapid childhood development.
    • We use adult IQ, get much slower childhood development, with 25 year olds still in middle school.
  • There’s side effects we weren’t expecting.
    • Bigger heads: more troubles in birth, higher C-section rate. 5 year olds with neck braces to help their neck muscles hold up their heads.
    • Shifts in common personality types: conscientiousness, extraversion. The funny kind of strangeness you see in mathematicians.
    • Ideological changes. The smart kids are wild radicals, pushing society into a new form every five years, or staid conservatives, enforcing complete rigidity on everyone.
    • Funny shifts in components of intelligence. Maybe we see big increases in mathematical intelligence or verbal intelligence but not vice versa.
  • The smart kids get alienated from everyone else, including their families. Parents stop doing it, and we get a long term schism in society between the Changed and the Unchanged.
  • The polygenic risk score we use works better in some populations than others. One group gets much much smarter, the other doesn’t move much. Social unrest and division ensue.
  • It’s expensive, so expensive that it’s inaccessible to most people. The rich and powerful make their kids smart. Dynastic politics ensue. They ban the technology for the peasants to avoid competition. Dynasties stay much more stable because there’s no longer any regression to the mean.
  • First-rate powers are risk averse and ban it. A third rate power decides to roll the dice and Indonesia or Uganda or Ecuador becomes the new power center of the world.
  • There’s incompatible sets of edits that can’t be made on the same genome.
    • Different sets of gene edits push you in different directions: you can make your kid wildly creative or a brilliant mathematician but not both.
    • Companies deliberately or accidentally engineer gene edits to be incompatible. Maybe Google merges two chromosomes so that their edits can only breed with other Google-compatible edits. We speciate into multiple similar but genetically incompatible species based on which company did your grandpa’s gene editing.

Just a morning’s thoughts. Any other ideas?

Overdispersion decreases variant fixation rates

My first post on this blog was about fixation of a new beneficial mutation. I presented Haldane’s original 1927 result showing that with some reasonable assumptions, the likelihood that a new mutation with a small selective advantage s fixed in the population was approximately 2s. That derivation assumed a Poisson-distributed number of descendants. We’ll show some modeling with overdispersion, and see how that affects the result.

The specific probability distribution for the number of “children” (i.e., new infected hosts) of a particular organism is assumed to be Poisson by Haldane. If you recall, we were evaluating the fixed point of the generating function of the distribution. The generating function of the Poisson distribution is

f(x) = e^{\lambda (x-1)}.

We then solved the equation f(x) = x for \lambda = 1+s and x = 1 - y, where y is the probability of extinction. We can naturally extend the Poisson distribution to a negative binomial distribution with mean \mu and variance \mu + \mu^2 / k. k is the overdispersion parameter, and we approach a Poisson distribution as it approaches infinity. The generating function of the negative binomial is instead

f(x) = \frac{\mu^{k}}{(\mu + k - kx)^{k}}

The goal is then to find a solution to the fixed point f(x) = x. Bartlett (1955) provides a useful approximation using the moments of the distribution:

ln(x) = ln(f(1)) + ln(x) \mu + \frac{ln(x)^2}{2} \sigma^2 + \ldots

This can be simplified to give an approximate solution valid for an arbitrary distribution:

x \approx exp \left [ \frac{2(1 - \mu)}{\sigma^2} \right ]

Applied to the Poisson distribution where the mean and sigma are both \lambda = 1 + s we get:

y = 1 - x \approx 1 - exp \left [\frac{-2s}{1 + s} \right ] = 2s - 4s^2 + \frac{22s^3}{3} \ldots

And we see that we recapitulate Haldane’s result with the first term of the Taylor series approximation.

Using the negative binomial distribution, we get

y = 1 - x \approx 1 - exp \left [\frac{-2s}{1 + 1/k + s + 2s/k + s^2/k}\right ]

Assume s is sufficiently small to neglect higher terms and we get:

y \approx 1 - exp \left [\frac{-2s}{1 + 1/k + s + 2s/k}\right] = \frac{2s}{1 + 1/k} - \frac{4s^2}{1 + 1/k} \ldots

And we see that the introduction of overdispersion by the negative binomial decreases the likelihood of fixation of the variant by approximately a factor of 1 + 1/k. k has been measured to be somewhere between 0.1 and 0.5 for the SARS2 virus, suggesting a 3-11-fold lower rate of fixation due to overdispersion of downstream infections than with a Poisson.

Watching for SARS-CoV2 hypermutability

One of my perennial concerns in this pandemic has been that the SARS-CoV2 virus would evolve a hypermutability phenotype, mutate rapidly, and become much more difficult to control. I’m going to briefly cover 1) why this would be bad 2) why this would be possible and, in a rare bit of optimism 3) why I no longer consider this very likely.

Hypermutability

Right now, mutation in SARS-CoV2 is the rate-limiting step in adaptation. Most new variants are lost (~98%) and almost all of the remaining ones are not beneficial. So SARS-CoV2 adaptation happens in steps: a long period of waiting, followed by the emergence of an adaptive mutation which sweeps to fixation. Appearance of new adaptive mutations depends on the intrinsic mutation rate of the virus, the population size of new infections available to mutate, and the likelihood that a new adaptive mutation is lost before its lineage can grow large.

We have seen in the past year that adaptive changes become more rapid as the population grows (i.e., Fisherian acceleration). We had ~6 months before the first adaptation, D614G, but multiple variants under positive selection occur in the last few months. The intrinsic mutation rate of SARS-CoV2 has been relatively static, with a few exceptions: 1-2 new mutations every month.

The exception is the red and orange dots there (the new 501Y variants of concern), which seem to have had a large jump in mutations right before emerging, then settled down to the normal rate.

As long as most time is spent waiting for the emergence of a new adaptive mutation, increasing the mutation rate would increase the rate of adaptive mutation. In the context of an emerging pandemic, rate of adaptation is bad, and we want to keep it as low as possible. Richard Lenski’s experiments in adaptive evolution are worth looking into. He has, for 35 years, been growing flasks of bacteria and watching them evolve over time. In three of those lines, he saw the emergence of a hypermutable phenotype: the strain suddenly greatly increased its mutation rate enormously, and that strain then took over the environment.

It looks as though hypermutability isn’t intrinsically more adaptive to the bacteria; its emergence involved breaking error-correcting enzymes that are there for a reason. Instead, a strain with hypermutability will occasionally by chance develop a beneficial mutation soon enough after that that beneficial mutation will carry it to fixation. And precisely because they are hypermutable, they are much more likely to catch a beneficial mutation by chance than a random lineage. Once fixed, it will then evolve significantly faster. In a local optimum, hypermutability would be bad for the bacterium; new beneficial mutations would be absent and you would see an increase in genetic load with no offsetting benefits. Far from genetic optima, in a new environment, hypermutability would allow much more rapid adaptation to the new environment.

Breaking things

SARS-CoV2, like MERS and SARS-CoV, is an RNA virus with a naturally high mutation rate. RNA is much less stable than DNA, and RNA viruses normally have a fairly high mutation rate. Because these coronaviruses are so large, they have developed a mechanism to reduce the mutation rate. One of the proteins they produce through cleavage of the Orf1ab polyprotein is an exonuclease (nsp14) that proofreads the RNA polymerase that copies the virus.

Previous work with the original SARS-CoV virus showed that knocking out the nsp14 exonuclease with a pair of mutations in the enzymatic active site increased mutation rates 10-20 fold, causing a hypermutable phenotype. The obvious concern is that a single random mutation in the SARS-CoV2 exonuclease would do the same thing, causing much more rapid adaptive selection.

SARS-CoV2 exonuclease is essential

It turns out, this experiment has been done by a Dutch group already researching the very similar MERS virus, and applying the same techniques to SARS-CoV2. They found that simple knockouts of exonuclease in both MERS and SARS-CoV2 were lethal rather than causing a hypermutability phenotype as in SARS-CoV1. They suspect that the exonuclease has been so thoroughly integrated into the replication complex that replication is no longer possible with the enzyme broken.

In short:

  • Hypermutability would be bad, causing much faster adaptive evolution of COVID
  • There’s a proofreading enzyme that, when broken in the SARS1 virus, causes hypermutability
  • That enzyme is present in the SARS2 virus, but is so tightly integrated that breaking it kills the virus
  • So hypermutability phenotypes are probably fairly hard to produce in SARS2
  • We can expect COVID to continue to evolve at about the same rate

Revisiting viral selection

This preproof from Nathan Grubaugh (Yale, PhD in microbiology, 2016), Bill Hanage (Harvard, PhD in epidemiology), and Angela Rasmussen (Columbia, PhD in microbiology, 2009) came across my desk. It’s about the spike protein variant that, as Bette Korber at Los Alamos demonstrated, has become the dominant strain of the coronavirus in the last couple of months.

These guys present no data, just opinion. Here’s the core bit of their paper:

[Korber et al.] present compelling data that an amino acid change in the virus’ spike protein, D614G, emerged early during the pandemic, and viruses containing G614 are now dominant in many places around the world. The crucial questions are whether this is the result of natural selection, and what it means for the COVID-19 pandemic. Korber et al. hypothesized that the rapid spread of G614 was because it is more infectious than D614…As an alternative hypothesis… the increase in the frequency of G614 may be explained by chance, and the epidemiology of the pandemic.

In February, the area with the most COVID-19 cases shifted from China to Europe, and then in March on to the US. As this and other work shows, the great majority of SARS-CoV-2 lineages in the US arrived from Europe, which is unsurprising considering the amounts of travel between the continents. Whether lineages become established in a region is a function not only of transmission, but also the number of times they are introduced. There is good evidence that for SARS-CoV-2, a minority of infections are responsible for the majority of transmission (Endo et al., 2020). Therefore, while most introductions go extinct, those that make it, make it big (Lloyd-Smith et al., 2005). Over the period that G614 became the global majority variant, the number of introductions from China where D614 was still dominant were declining, while those from Europe climbed. This alone might explain the apparent success of G614.

Got that? They think that random chance is a likely hypothesis. Europe happened to be the next big outbreak after China, Europe happened to get this G614 variant, and then Europe spread that variant to the rest of the world.

There are some historical problems with this account. But before I get to those, I want to go through some math. How likely is it that a new introduction of the virus “succeeds” – that is, seeds an infection chain that doesn’t die out? We can approximate this based on three parameters:

  1. The relative advantage s of the introduced strain as compared with the old strain.
  2. The growth rate in the new location R_t, which gives the mean new infections per infection.
  3. The overdispersion \alpha, which describes the variance in new infections per infection.

The assumptions I’ll need for the approximation are that s is small and R_t is close to 1. While the second wasn’t true early on in the pandemic (it was much bigger), social distancing and other measures have made that true for most areas since about early April. The authors of the preproof helpfully linked a paper that estimated the overdispersion parameter based on Chinese data to be about 10.

Novel Introduction

First, let’s estimate how likely it is that a single infection can seed a completely new outbreak in a new location – one where there isn’t another outbreak already. I won’t go through the details here, but we get

p_{new} = \frac{2(R_t - 1)}{1 + \alpha}

We can then use Haldane’s approximation to estimate how many introductions are needed to get a 50% chance of seeding a new outbreak:

n_{new} = \frac{ln 2(1 + \alpha)}{2(R_t - 1)}.

A couple of points: if the mean growth rate in the new location is less than 1, no new outbreaks. You could have an unlucky first “superspreader” event, but then those secondary infections would on average die out. The higher the growth rate is, the more likely the outbreak is to seed. Second, the overdispersion makes the infection harder to seed relative to a Poisson distribution. The more variable the growth rate is, the less likely any single infection is to seed a big event.

To feed in some actual numbers, if R_t=1.15 and \alpha=10, we get p_{new}=1.36% and n_{new} = 51. Any single new infected traveler has just over a 1% chance of seeding a new outbreak, and you’d need ~50 to have even odds of getting a new outbreak.

Introduction Under Competition

What happens if we try to seed an outbreak where there’s already cases? Depends on the difference in growth rate between the existing cases and the new strain. Here (1 + s) acts as a multiplier on the growth rate. We have

p_{compete} = \frac{2(s + R_t - 1)}{1 + \alpha} and

n_{compete} = \frac{ln 2(1 + \alpha)}{2(s + R_t - 1)}.

Increased fitness makes it more likely. Note that if the local growth rate is negative, you can still get a new outbreak if the higher fitness would push the growth rate positive.

History

Alright, let’s talk West Coast numbers. I’ll be using nextstrain.org, since that data’s publicly available. The West Coast (California, Oregon, and Washington) had very early outbreaks, seeded by travelers from China with the original D614 strain. By May 14, there are no more sequenced genomes of the original D614 in the database. The last one in Washington is May 6, in California May 12, and in Oregon May 13. Back in February, the original strain was the only one sequenced in Washington, and the dominant one in both Oregon and California. Note that Washington then was the center of the US epidemic, with several deaths by the end of February. The first of the new strain to show up in Washington is March 11, Oregon March 11, and California February 4. If we focus on Washington, that means in two months (or about 10 generations, with a 6 day generation time) the new strain completely takes over.

Now, the test volume in Washington was about 4000-5000 tests per weekday over the whole of that two months, with about 200-450 positive tests per day, peaking in late March and then dropping slowly until early May. That suggests the growth rate was slightly negative during that two months, making it very hard indeed to seed a new outbreak of the new strain from only a couple introductions.

Strain Replacement

Just introduction of the new strain by New Yorkers or Europeans won’t cause complete replacement of the old strain by itself. If there’s no fitness differential, all it will do is seed new infection chains with the new strain. Suppose you have 500 cases of the old strain and then 4,500 New Yorkers come to Seattle. Then most of the new cases will be of the new strain, but not all of them. There’s no reason for the old cases to stop presenting new cases.

One way for this to happen is if there is a differential fitness, so that the old strain has a negative growth rate while the new strain has a positive or less negative growth rate. That lets the new strain outcompete the old strain, until the population of the old strain is small enough that it can disappear by chance.

The other way is for all the existing cases to get swamped by new introductions under conditions of very low growth. Suppose the intrinsic growth rate in Washington was extremely low, say 0.5. So the number of cases drops by half every generation. But every generation new cases get introduced from another location dominated by the new strain (e.g., New York). So we start off with say, 400 cases in Washington. Then in the next generation, those cases only generate 200 new cases. But we also get another 150 cases imported from New York, for 350 total cases, 43% new. Repeat for 10 generations, and you’ll end up with no cases of the old strain, but a fair number of cases of the new. Note that you need a really low growth rate for this to happen in only 10 generations: R_t=0.8 only drops the population 10-fold.

Effective Local Response to SARS-CoV-2

I think it’s become fairly apparent that effective federal and even state responses to this pandemic are not going to happen in the short term. I had a discussion over at Greg’s blog where I pointed out that effective local responses are still perfectly possible in most places. So let’s take a brief look about what that might look like.

We’re going to look at this from the perspective of a small or medium size city that is:

  • Politically independent (has its own government and police force)
  • Reasonably geographically separated (not a suburb of a much larger metropolitan area)
  • Reasonably wealthy (such that the measures I’ll suggest are affordable without external support)
  • Not yet heavily infected

Let’s suppose it’s about 80,000 people, 20% children and 15% seniors. First let’s figure out what the impact on the city would be without effective response. We’ll conservatively guess that about 60% of the population would eventually be infected, with a hospitalization rate for kids, adults, and seniors of 2%, 20%, and 35%, and a case fatality rate of 0%, 0.5%, and 6%, respectively. That would give us at the end of the epidemic 156 dead adults, 432 dead senior citizens, and 192 hospitalized kids, 6,240 hospitalized adults, and 2,520 hospitalized seniors. That’s not great. Depending on how big your local hospital is and how bad the cases are, the death rate could be higher if those infections happen too fast.

So. Suppose you run this city. What can you do about it without federal or state help?

Random Testing

Let’s assume you have access to at least some testing capacity. If you’re like most places, it’s under-used right now. Let’s say you’ve got 100 tests per day that aren’t needed for testing of healthcare workers or new patients. Use that capacity: Randomly choose households in your town, call them up and read this script:

Hello. My name is X, with city Y. We’re running a testing program to help our citizens know whether they’ve been infected with the new coronavirus. We’ve chosen your household to be a part of this program. If you’re interested, bring at most two members of your household to the local hospital tomorrow and give them this ID number: #. Thanks.

At 100 tests a day, you’re randomly testing 0.125% of the population. Just this level of random testing is enough to spot a new outbreak by the time it gets to the size of ~500 people or so. That’s small enough it’s still containable.

Contact Tracing

Set up your own local contact tracing team. Once you’ve spotted an outbreak from random testing, a sick patient presenting, or a lead from another jurisdiction’s tracing team, you’ll want to figure out how much it’s spread and contain it as soon as possible.

Take your index patient and interview them: figure out how long they’ve been infectious. Where do they work, who do they live with, what shops have they been to. You’re trying to figure out who else could have been exposed to them while they were infectious. Once you’ve got potential links you call them up too and do the same thing: are they sick or showing any symptoms, were they at location X at time Y, and who have they been in contact with since then.

Recommend self-quarantine for all these folks and refer them for testing.

Basically all of this can be done with volunteers. You’ll need a script for each stage, volunteers with phones, and interpreters. In some cases you’ll want some boots on the ground as well. You could find underemployed public employees who want to help or just ask around in the community. There’s lots of people out of work who would be interested in helping. I’d look specially at librarians and public school administrators, but lots of people can be useful here.

Regulations to Help Tracing

You can do tracing with just phone work, but some additional regulation will make it easier. In particular we’re looking for things that will make it easier to see when two people were close to each other at the same time. Workplaces generally have timesheets that will let you see overlapping employees, but shops don’t keep track of customers. So ask them to: require that they keep a paper ledger that customers and employees need to sign into with the date and time when they entered the shop. A book like this costs ~$10; just buy one for every shop in your town and pick them up/drop them off weekly.

Centralized Quarantine

Intra-household infections are hard to stop with self-quarantine. So offer your citizens an alternative to getting grandma infected. Find some hotels in the area. They’re likely to be nearly empty. See if they’ll donate rooms for quarantine, or maybe offer a tax rebate for them. Set them up for quarantine of infected or contacted citizens. Make sure there’s food delivery and internet service available.

Then when you do your contact tracing and you tell someone they were in long contact with a known infection, you say “If you’re worried about infecting your family members, we can put you up in the hotel for free.”

Travel Restrictions

All of the above measures are meant to keep the infection from spreading widely in your town, but you’ll have to keep the number of new infections that you get from outside down as low as possible. So impose some simple travel restrictions: restrict entry to your town to residents or essential workers. Put some police or volunteers at the main roads and screen entrants. You could make it easier by doing something like mailing out a couple of stickers to put on the windshield to every household. Then cars with the sticker can get waved past.

Strength of these restrictions probably depends on how common the infection is in your local area. If your town has an airport or is near an airport, you’ll want to be a lot more careful, especially with long-distance travelers.

Other Public Health Measures

Standard stuff that everyone knows already should be kept up: have people wear masks outside, encourage social distancing, close schools, ban large gatherings. Try not to be a jerk about it; voluntary compliance is a lot better than forced compliance. That means, for example, having your cops hand out masks to people who don’t have them instead of roughing up those who aren’t wearing them.

Results

There’s no reason why sensible and mostly voluntary restrictions like these can’t let your town open its stores and factories and keep its economy going while also keeping the coronavirus from running rampant.

Likely selection for D614G S

There’s some foolishness going around about this recent preprint from Los Alamos.   They’ve spotted several mutations in collected SARS2 sequences from around the world that look as though they may be under positive selection.  We’re going to look into one in particular, a D614G mutation in the spike protein.  This mutation looks as though it’s increasing in relative prevalence in every region with decent sampling through March and early April.  


As for the foolishness, there’s people responding to this by arguing that this pattern is more likely to be caused by bottlenecking or drift than selection.  Here’s an epidemiology prof at Harvard saying this.  Here’s a group lead at EMBL in the UK.  Here’s a Cambridge virologist.  The argument here is that the impact is random:  Italy happened to get this particular strain and then spread it across Europe.


Let’s dive into some specific regions in more detail.  The Harvard guy claims that Washington State data is consistent with a new seeding of the new strain from Europe or New York followed by roughly equivalent suppression of the old and new outbreaks by various measures.  I don’t have access to the raw GISAID database at the moment, so we’ll use the visualization provided by nextstrain.org.  Pull it up and set to north-america and color by the genotype at S614 to look at these two strains.

Then scroll down to the bottom and click on “Washington” in the “Filter by Admin Division” section.  That’ll show you only the (496 as of this writing) samples collected from Washington state.

 Then scroll up and look at the tree.  The color tells you which strain you’re looking at.  Blue is the old strain, yellow is the new.  The very first dot was from January 1st, with the original aspartic acid.  Then no sequenced cases for a month and a half.  From Feb 18- Mar 9, there are 148 new cases in Washington, all old strain, and all but two descended from the original January case.  So through early March we’re mostly seeing an epidemic growing from only a few initial introductions.

The first case with a G at this locus is March 10, there’s two of them, and they’re fairly diverged at other loci, suggesting multiple introductions of this new strain into Washington State, probably from the East Coast or Europe.  Let’s look at weekly cases in Washington after those first introductions.  You can get these by downloading the data at the bottom of the page.  The new strain is clade A2a, old strain is everything else.

WashingtonTotal casesOld StrainNew StrainFraction New
March 10-1677552229%
March 17-2354262852%
March 24-3079334658%
March 31-April 6180899151%
April 7-1316016100%

So this data agrees with the thesis presented in the article:  from early March to early April, the fraction of Washington cases with the new strain rises from 0% to 100%.  Now, the 100% is probably an overestimate.  In particular, they’re all gathered by the UW Virology lab, the importance of which I’ll get to in a minute.  However, they’re divergent, not all from a single cluster.  But it certainly looks as though the new strain is becoming dominant in Washington State.


Now, most of these samples come from either the UW Virology lab or the Washington State Department of Health.  The state gathered samples also have county information.  Let’s dig into that a little bit.  The two counties I’m going to focus on are King county and Yakima county.  King County contains Seattle, while Yakima County is much smaller and on the other side of the mountains.  The hope is that we can pick up separate trends due to the geographical separation.

First King County.  Remember that this doesn’t include the UW samples, so we get a much smaller count.

King CountyTotal casesOld StrainNew StrainFraction New
March 10-168800%
March 17-2332133%
March 24-3073457%
March 31-April 61531280%

Then Yakima County.  Here the state didn’t start testing until late March.

Yakima CountyTotal casesOld StrainNew StrainFraction New
March 10-160
March 17-230
March 24-302017315%
March 31-April 663501321%

So it looks like we’re seeing what I expected to see:  later introduction of the new strain into Yakima county data, followed by growth there.


We can do a similar kind of look at the California samples, though there aren’t nearly as many to work with (mostly San Francisco, from UCSF, with some other scattered samples).  There we see the first of the new strain show up on March 7th in Sacramento County.  From then:

CaliforniaTotal casesOld StrainNew StrainFraction New
March 7-1373457%
March 14-20103770%
March 21-2727151244%
March 28-April 442241843%
April 5-1771059%

Here it’s more ambiguous and we don’t have as much data, but a month after the new strain is first seen, it’s quite firmly established in California as well.

As far as the foolishness goes: it’s entirely possible to have a particular clade or strain take over in a single location due to chance: person X spreads it there, person X happens to have that strain, no other strain gets introduced for a while so that strain becomes predominant. But in both Washington and California we see new introduction of D614G in early March when there are already spreading epidemics, followed by rapid growth such that a month later ~half of all cases originate from the later introduction. I don’t yet have access to the full GISAID dataset, but the authors state that the same thing is happening in England, Germany, Japan, and Australia. The same thing happening multiple places is not random chance, it’s something else. As for the functional speculations about higher viral load, they’re suggestive but not dispositive. We ought to look at them. And as for the fools at various institutions, they should – but won’t – shut up when they don’t know what they’re talking about.

IHME projections are absurdly optimistic

The state-specific projections from IHME have been widely disseminated and used for federal, state, and hospital planning. For example, here’s an interview with one of the UW professors on the project describing how their projections are being used by the feds. They have done yeoman’s work in collecting appropriate data, but their actual model has some serious flaws.

The details of the model itself can’t be found on their projection webpage; you have to go the medarxiv article. Their introduction gives a brief criticism of the SIR model that I’ve been using as a base for my estimates. In particular, SIR models generally don’t include the effect of changes in social behavior in response to the epidemic, they assume panmixia (random mixing among the population) and generally model using reported case numbers, which are distorted by variance in testing rates and testing strategies.

These criticisms are all pretty reasonable. It’s a good idea to look at the effect of controls (i.e., reducing R_0), which many simple models haven’t done. Panmixia isn’t accurate, and likely overestimates the speed of the epidemic, especially when the modeling is done on the scale of the whole country. And death rates are likely more accurate than case numbers, due to limited testing.

Let’s see what they’re suggesting as an alternative. Here’s how they’re doing the projections of total deaths in each state:

  • Gather state-specific data on imposed controls, hospital capacity and utilization, local age structure and death rates.
  • Estimate death rates for infection after correcting for local age structure.
  • Code controls from level 0-4 for # of the following: school closures, non-essential business closures (incl. bars/restaurants), stay-at-home recommendations, and travel restrictions.
  • Model the impact of controls on maximum death rate and time to reach the inflection point using published data from Wuhan, scaled based on the coded controls for each state, with Wuhan = 4.
  • Fit cumulative death rate (i.e., deaths per capita) to a Gaussian error function with three parameters: a location-specific inflection time where deaths stop accelerating, a location-specific growth rate, and a global (age-normalized) maximum death rate. Either the inflection time or the maximum death rate was assumed to depend on the delay between initial deaths and imposition of controls.
  • Assume that states without full controls impose full controls a week after the current date.
  • Project out the fits, incorporating variance in the parameter fits to generate a set of projections.

There are some really serious problems with this model, and all of them push the results towards underestimating the problem in the US. Let’s go through the problems.

Unqualified acceptance of CCP data

The data for deaths in Wuhan come from the provincial Hubei government, straight from the Chinese Communist Party. There are very good reasons to be skeptical of these numbers. The CCP lies constantly, and it is very hard for outsiders to verify these numbers. However, there is technical data that suggests strongly that the CCP is undercounting. A late-Feburary analysis looks at activity of crematory facilities in Wuhan as well as temporarily-leaked numbers on Tencent. This data suggests a 5-10 fold higher rate of deaths than was reported by the CCP in late February. There have also been recent reports of very high numbers of funerary urns needed in Wuhan, in excess of what would be required by the officially reported deaths.

This is a huge problem for the model, as it is highly parameterized by the assumed ending of the epidemic in Wuhan in short time, by the imposition of social distancing controls.

Undercounting of deaths elsewhere

There is good reason to believe that official COVID19 deaths are undercounted in most areas. In particular, total deaths per month for several hard-hit Italian towns went up 3-5 times compared to the same week last year, with only a fraction of that excess officially counted as COVID19 deaths.

In this model, current data on deaths in other polities are used to fit total deaths in the future, so a significant undercounting of current deaths would undercount the future projections.

Assumption of control imposition

They assume that states that do not have infection controls impose full controls within a week. This is unlikely at best.

Bad model choices

They choose to model the cumulative death rate as a Gaussian error function based on, AFAICT, no theoretical reason, only that it fit “the data” better. Given that the only data in their set that has run to completion is Wuhan’s, and that the Wuhan dataset is compromised, this is an extremely foolish decision.

Further, this model assumes that the acceleration of deaths as the epidemic grows is the same as the projected deceleration of deaths after infection controls are imposed. There is absolutely no reason to expect this. As the acceleration is related to R_0, this is equivalent to saying that R_0 before controls is equal to 1/R_{ctrl}. Even if controls are effective at bringing the growth rate negative, there is no reason why the deceleration can’t be much slower than the initial growth rate (e.g., if R_0 before controls is ~3 and after controls is ~0.95).

A better approach would be a modified SIR model using European, US, and non-CCP Asian data to estimate the impact of various infection control measures.

Overestimation of control effectiveness

They assume that state-wide infection controls are equivalent in effect with the controls the CCP imposed in Wuhan.

This is crazy. I was watching media reports and social media in January and February keeping an eye on what was happening in Wuhan. I saw officials welding apartment doors shut, abducting individuals with fevers from their homes or off the street, mass-disinfecting whole cities, setting up walls outside of villages to keep the infection out, people not wearing facemasks being harassed and assaulted, running internal checkpoints to identify people with fevers.

I live in an area with fairly strong infection controls by US standards. We are nowhere close to that level of infection control. I went out for the first time in two weeks to the grocery store to stock up. There was one other person wearing a face mask, and they were wearing it wrong (not covering their nose). I repeatedly had to dodge back out of aisles to avoid people coming straight at me. This assumption is ridiculous and crazy, and obviously so to anyone paying attention. They’re in Washington state, they should know better.

Chinese reports suggest that the infection controls imposed in Wuhan dropped the R_0 from ~3.5 to ~0.35, or about 10-fold. Note that they also include a period of partial controls, which in their case means:

  • Blocking outward transportation from Wuhan
  • Closing public transit and vehicular traffic inside Wuhan
  • Compulsory mask-wearing in public places
  • Cancellation of public events
  • Self-quarantine of confirmed or suspected cases

They estimate that during this period, R_0 dropped to ~1.25. For comparison, the full controls included as well:

  • Full quarantine of confirmed or suspected cases (i.e., extraction to a separate quarantine site), including contacts of confirmed cases
  • Temperature monitoring of all residents
  • Universal and strict stay-at-home orders for all residents

My expectation is that the current set of controls imposed by most European and US State governments is closer to the partial Wuhan controls than the full controls, and I fully expect growth rates to remain positive, though slower.

Conclusion

There are at least 5 serious, obvious problems with the IHME model, most seriously uncritical acceptance of Communist government data and wild overestimation of the effectiveness of US infection control measures. Given that I know from personal communication that multiple hospitals are using this to project out expectations for medical care required – and taking as valid that peak care will be required for only a matter of weeks – these reports are hugely irresponsible.

Further, the fact that all the problems I identified with their model cause it to be biased towards underestimating the scope of the problem, I suspect there are serious prejudices or biases in the people generating the model. Frankly, I think they’re projecting wishful thinking and looking for reasons to support optimism.

Updated Epidemiological Modeling

The previous model I built was a toy model: the simplest possible dynamics that would give you a feel for what’s going on, and parameter selection that was educated guesswork.

There was sufficient interest in it, though, that I decided it would be worth building a more sophisticated and hopefully accurate model.

State dynamics and parameterization

The previous model was a simple SIR model with only three states. Individuals were either susceptible, meaning vulnerable to infection but not yet infected, infected as well as infectious to susceptible individuals, or recovered (or dead). This is very simple to calculate, but has some obvious limitations. It doesn’t show the progression of the disease in individuals and the delay from action. Critical cases, for example, were simply calculated as a static fraction of all infected states at any point, when in fact they should be time delayed (it takes time for an infection to progress to a critical state). It’s hard to use to examine the effects of specific controls; all I did was vary the effective R_0 after imposing controls. But there’s no way to, e.g., specifically reduce the contact rate for symptomatic patients.

For the new model, I’ve spent some time reading case reports and aggregated studies. Key references are:

  • A clinical analysis of 1099 cases throughout China describing clinical course (Feb 28)
  • A case study of 9 clustered German cases with daily testing using both PCR and viral culturing (Mar 8)
  • A survey of hospital associations estimating total hospital bed and ICU bed capacity for the U.S. (Mar 17)
  • A survey/modeling study in two Chinese cities counting contact numbers, including between different age groups (Mar 20)
  • A CDC report of hospitalization rates, ICU admission rates, and mortality rates of confirmed American cases (Mar 26)
  • A CDC report of testing results from two cruise ships, including one where all crew and passengers were tested (Mar 26)
  • A detailed report of Chinese cases in Wuhan with estimation of transmission dynamics (Mar 26)

Based on my reading, it looks like the progression of the disease is as follows:

  • An incubation period of 4.1-7.0 days after infection, during which the person is not yet infectious to others.
  • A short pre-symptomatic period of ~2 days when the person is infectious but before onset of symptoms.
  • Some fraction of people are fully asymptomatic, and can spread the disease until it is cleared and they recover, about 12 days after the end of the incubation period.
  • The rest will develop noticeable upper respiratory symptoms, for a duration of 12 days before recovery or decline.
  • Some fraction of symptomatic cases will progress to non-severe or severe support needs from a hospital (for non-severe cases) or ICU (for severe cases). Hospitalization lasts a further 11-13 days, depending on severity.
  • Some fraction of hospitalized cases will die; the rest will recover and retain immunity to the disease.

Here is a diagram of the disease course:

Of particular note is that I have modeled hospitalized patients as no longer being infectious; this is due to the German study suggesting low viral shedding 10 days post symptom onset.

The basic R_0 value is estimated at 2.2 based on the Chinese Wuhan data. This can be converted into estimated time between contacts causing a new infection for infectious people of ~5.9 days.

The asymptomatic fraction is estimated at 50% based on data from the Diamond Princess cruise ship. All passengers and crew on the ship were tested and ~50% of the positive test results had shown no symptoms.

The results of symptomatic cases are age-stratified based on the CDC report of American cases, including rates of hospitalization, ICU admission, and death. New infections are assumed to be randomly distributed among age brackets proportional to their fraction of the susceptible population.

The most speculative part of the model is the effects of health care saturation. While I have reasonable estimates of the available hospital and ICU beds, it’s necessarily speculative to estimate what happens to people with cases requiring hospitalization if that hospitalization is not available. I’ve made what I feel are likely conservative guesses. I’ve assumed triage of scarce beds to younger and less-severe patients.

Model with no controls

The first test is simply to run the model without attempting to impose infection controls and see what the result is. Here is a log-scale plot of total hospitalizations, ICU admissions, and deaths over the 2-year course of the epidemic.

The deaths top out at ~10 million total. Note that this is higher than my previous model, in part because of better estimation of available beds and in part because of longer estimated hospital stays. The death rate is strongly biased towards the elderly, in part due to intrinsically worse outcomes and in part due to triage of scarce resources.

Nearly 20% of 85+ Americans would be expected to die, and even 1 in 19 of 55-64 year olds, 1 in 47 of 45-54 year olds, and 1 in 164 20-44 year olds.

Without the impact of scarce hospital resources, the results look much better:

You can see that the red line no longer bends upward when the hospitalized cases hit the ICU or hospital bed capacity, reducing the total deaths almost 4-fold to 2.8 million. As before, the deaths are heavily skewed towards the elderly:

Unfortunately, avoiding running out of hospital capacity at the peak would require millions of available ICU beds simultaneously, which is rather unrealistic. We’ll look into the effects of imposed infection controls in my next post.

Appendix: code

PopState = namedtuple("PopState", [
    "susceptible",
    "incubating",
    "presymptomatic",
    "asymptomatic",
    "symptomatic",
    "nonsevere",
    "severe",
    "dead", 
    "recovered",
])

GeneralParams = namedtuple("GeneralParams", [
    "incubating_days",
    "presympt_days",
    "asympt_days",
    "sympt_days",
    "nonsev_days",
    "severe_days",
    "R0",
    "frac_asymptomatic",
])

AgeBracketParams = namedtuple("AgeBracketParams", [
    "frac_nonsevere",
    "frac_severe",
    "nonsevere_dead_home",
    "severe_dead_icu",
    "severe_dead_hosp",
    "severe_dead_home",
])

shared_params = GeneralParams(
    5.0, # Days incubating before infectious
    2.0, # Days infectious before symptomatic
    10.0, # Days infectious for asymptomatic cases
    12.5, # Days infectious for symptomatic cases
    11.0, # Days hospitalized for non-severe cases
    13.0, # Days hospitalized for severe cases
    2.2, # R0, base infectivity rate
    0.5, # Estimated fraction asymptomatic cases
)

# These are high-end estimates from the CDC publication
bracket_params = {
    "age_0_19": AgeBracketParams(
        0.025, 0.0,
        0.10, 
        0.0, 0.0, 0.0,
    ),
    "age_20_44": AgeBracketParams(
        0.166, 0.042,
        0.15, 
        0.05, 0.10, 0.15,
    ),
    "age_45_54": AgeBracketParams(
        0.179, 0.104,
        0.18, 
        0.08, 0.16, 0.24,
    ),
    "age_55_64": AgeBracketParams(
        0.189, 0.112,
        0.20, 
        0.23, 0.45, 0.85,
    ),
    "age_65_74": AgeBracketParams(
        0.247, 0.188,
        0.25, 
        0.26, 0.55, 0.95,
    ),
    "age_75_84": AgeBracketParams(
        0.277, 0.310,
        0.31, 
        0.34, 0.67, 1.0,
    ),
    "age_85": AgeBracketParams(
        0.413, 0.290,
        0.35, 
        0.94, 1.0, 1.0,
    ),
}

bracket_states = {
    "age_0_19":  PopState(81.98e6, 0, 0, 0, 0, 0, 0, 0, 0),
    "age_20_44": PopState(108.84e6, 10, 0, 0, 0, 0, 0, 0, 0),
    "age_45_54": PopState(41.63e6, 0, 0, 0, 0, 0, 0, 0, 0),
    "age_55_64": PopState(42.27e6, 0, 0, 0, 0, 0, 0, 0, 0),
    "age_65_74": PopState(30.48e6, 0, 0, 0, 0, 0, 0, 0, 0),
    "age_75_84": PopState(15.39e6, 0, 0, 0, 0, 0, 0, 0, 0),
    "age_85":    PopState(6.55e6, 0, 0, 0, 0, 0, 0, 0, 0),
}
brackets = [
    "age_0_19",
    "age_20_44",
    "age_45_54",
    "age_55_64",
    "age_65_74",
    "age_75_84",
    "age_85",
]

ICU_BEDS = 31482
HOSPITAL_BEDS = 501470

def update_state(states, general, params, day=1, start_day=70, control_rate=2.2, icu_beds=ICU_BEDS, hospital_beds=HOSPITAL_BEDS):
    def total_state(state):
        return sum(states[age].__getattribute__(state) for age in brackets)
    
    total_S = total_state("susceptible")
    total_N = sum(sum(states[age]) for age in brackets)
    
    # Generate the new states as dicts so they're mutable
    new_states = {age: states[age]._asdict() for age in brackets}
    
    # Adjust the controlled rate
    R0 = control_rate if day >= start_day else general.R0
    
    # Adjust for constant time between contacts
    mean_infectious_days = general.presympt_days
    mean_infectious_days += general.asympt_days * general.frac_asymptomatic
    mean_infectious_days += general.sympt_days * (1 - general.frac_asymptomatic)
    days_betw_contacts = mean_infectious_days / R0
    
    # susceptible --> incubating
    total_preI = total_state("presymptomatic")
    new_infections_from_presympt = (total_preI * total_S) / (total_N * days_betw_contacts)
    total_asymI = total_state("asymptomatic")
    new_infections_from_asympt = (total_asymI * total_S) / (total_N * days_betw_contacts)
    total_symI = total_state("symptomatic")
    new_infections_from_sympt = (total_symI * total_S) / (total_N * days_betw_contacts)
    
    new_infections = new_infections_from_presympt + new_infections_from_asympt + new_infections_from_sympt
    for age in brackets:
        # each age gets new infections proportionate to its fraction of the susceptible population
        # (i.e., panmixia across all age groups)
        transfer = (states[age].susceptible / total_S) * new_infections
        assert transfer <= states[age].susceptible
        assert transfer >= 0
        new_states[age]["susceptible"] -= transfer
        new_states[age]["incubating"] += transfer

    # incubating --> presymptomatic
    for age in brackets:
        transfer = (states[age].incubating / general.incubating_days)
        assert transfer <= states[age].incubating
        assert transfer >= 0
        new_states[age]["incubating"] -= transfer
        new_states[age]["presymptomatic"] += transfer
    
    # presymptomatic --> asymptomatic / symptomatic
    for age in brackets:
        transfer = (states[age].presymptomatic / general.presympt_days)
        assert transfer <= states[age].presymptomatic
        assert transfer >= 0
        new_states[age]["presymptomatic"] -= transfer
        new_states[age]["asymptomatic"] += transfer * general.frac_asymptomatic
        new_states[age]["symptomatic"] += transfer * (1 - general.frac_asymptomatic)
    
    # asymptomatic --> recovered
    for age in brackets:
        transfer = (states[age].asymptomatic / general.asympt_days)
        assert transfer <= states[age].asymptomatic
        assert transfer >= 0
        new_states[age]["asymptomatic"] -= transfer
        new_states[age]["recovered"] += transfer
    
    # symptomatic --> recovered / nonsevere / severe
    for age in brackets:
        transfer = (states[age].symptomatic / general.sympt_days)
        assert transfer <= states[age].symptomatic
        assert transfer >= 0
        new_states[age]["symptomatic"] -= transfer
        new_states[age]["severe"] += transfer * params[age].frac_severe
        new_states[age]["nonsevere"] += transfer * params[age].frac_nonsevere
        new_states[age]["recovered"] += transfer * (1 - (params[age].frac_severe + params[age].frac_nonsevere))
    
    # nonsevere --> recovered / dead
    # and 
    # severe --> recovered / dead
    free_hospital_beds = hospital_beds
    free_icu_beds = icu_beds
    for age in brackets:
        # nonsevere patients get first pick of hospital beds for triage
        transfer = (states[age].nonsevere / general.nonsev_days)
        assert transfer <= states[age].nonsevere
        assert transfer >= 0
        if transfer > 0:
            new_states[age]["nonsevere"] -= transfer
            # youngest patients get first pick of hospital beds for triage
            nonsevere = states[age].nonsevere
            in_bed = min(nonsevere, free_hospital_beds)
            at_home = nonsevere - in_bed
            free_hospital_beds -= in_bed
            dying = (at_home / nonsevere) * transfer * params[age].nonsevere_dead_home
            assert dying <= transfer
            new_states[age]["dead"] += dying
            new_states[age]["recovered"] += transfer - dying
        
        # severe patients get what's left over
        transfer = (states[age].severe / general.severe_days)
        assert transfer <= states[age].severe
        assert transfer >= 0
        if transfer > 0:
            new_states[age]["severe"] -= transfer
            # bed shortage
            severe = states[age].severe
            in_icu = min(severe, free_icu_beds)
            free_icu_beds -= in_icu
            in_bed = min(severe - in_icu, free_hospital_beds)
            free_hospital_beds -= in_bed
            at_home = severe - (in_icu + in_bed)
            dying = transfer * (
                (at_home / severe) * params[age].severe_dead_home + 
                (in_bed / severe) * params[age].severe_dead_hosp + 
                (in_icu / severe) * params[age].severe_dead_icu
            )
            assert dying <= transfer
            new_states[age]["dead"] += dying
            new_states[age]["recovered"] += transfer - dying
    
    # Now cast everything back to states
    final_states = {
        age: PopState(
            new_states[age]["susceptible"],
            new_states[age]["incubating"],
            new_states[age]["presymptomatic"],
            new_states[age]["asymptomatic"],
            new_states[age]["symptomatic"],
            new_states[age]["nonsevere"],
            new_states[age]["severe"],
            new_states[age]["dead"],
            new_states[age]["recovered"],
        )
        for age in brackets
    }
    return final_states, day + 1

def run_model(states, general, params):
    def total_state(state):
        return sum(states[age].__getattribute__(state) for age in brackets)
    
    def total_infections():
        return sum([
            total_state("incubating"), 
            total_state("presymptomatic"), 
            total_state("asymptomatic"), 
            total_state("symptomatic"),
            total_state("nonsevere"),
            total_state("severe"),
        ])

    infections = [total_infections()]
    nonsevere = [total_state("nonsevere")]
    severe = [total_state("severe")]
    deaths = [total_state("dead")]
    
    day = 1
    
    while total_infections() >= 0.25:
        states, day = update_state(states, general, params, day=day, start_day=230, control_rate=2.2,
                                  icu_beds=ICU_BEDS, hospital_beds=HOSPITAL_BEDS
                                  )
        infections.append(total_infections())
        nonsevere.append(total_state("nonsevere"))
        severe.append(total_state("severe"))
        deaths.append(total_state("dead"))
    return np.array(infections), np.array(nonsevere), np.array(severe), np.array(deaths), states

I, H, C, D, states = run_model(bracket_states, shared_params, bracket_params)

plt.figure(figsize=(12, 8))
plt.grid()

plt.plot(H, label="All Hospitalizations", color='k')
plt.plot(C, label="ICU admissions", color='orange')
plt.plot(D, label="Deaths", color='red')
plt.legend()
plt.yscale('log')
plt.ylim(ymin=1)
plt.xlim((0, 600))

plt.hlines(ICU_BEDS, 0, 600, linestyle='dashed', color='g')
plt.hlines(HOSPITAL_BEDS + ICU_BEDS, 0, 600, linestyle='dashed', color='b')
plt.ylabel("# of People")
plt.xlabel("Days After Initial Infection")

death_rates = [100 * states[age].dead / (states[age].dead + states[age].recovered + states[age].susceptible) for age in brackets]

plt.figure(figsize=(8, 6))
plt.bar(range(len(brackets)), death_rates)
plt.grid()
labels = [
    "0-19",
    "20-44",
    "45-54",
    "55-64",
    "65-74",
    "74-85",
    "85+",
]
plt.xticks(range(len(brackets)), labels, rotation=45)
plt.xlabel("Age Bracket")
plt.ylabel("% of Bracket Dying of COVID-19")

Epidemiological modeling – costs of controls

One point that’s come up a couple of times is the expense of imposing serious infection controls. China probably lost hundreds of billions of dollars on controlling the outbreak in Wuhan. Let’s try to make a guess of how much it would cost us, and use that to figure out what the best path forward is.

For starters, let’s take some reference points. Assume that completely uncontrolled, the R0 for the virus is 2.0. This probably incorporates small personal, relatively costless changes like better handwashing, sneeze and cough etiquette, etc. Evidence from China suggests that their very strong controls brought the R0 down to about 0.35. We’ll assume that those level of controls mostly shutdown the economy, reducing economic activity by 90%. U.S. 2019 GDP was about $21.44 trillion, so we can estimate the daily cost of these extremely stringent controls in the U.S. as being about $50 billion.

Next, we’ll guess that the daily cost of imposing controls gets harder as you go lower – it’s not linear in R0. Instead we figure that every 1% reduction in R0 costs the same amount. So, to drop R0 from 2.0 to 0.35 we estimate would cost about $50 billion, and thats 173.42 1% reductions, so each 1% reduction would cost about $288 million dollars per day. There’s one other thing to keep in mind for cost accounting: it’s much, much cheaper to impose controls when the total number of infections is low. In the low regime, you can do contact tracing and targeted quarantine rather than wholesale lockdowns. So if the total number of infections in the population is below 1,000, we’ll divide the cost of controls for that day by 500.

Finally, we need to balance the costs of infection controls with the benefits from reduced infection and mortality. All cases, including mild, non-critical cases, cost personal lost income: about $200 a day (median income of ~70K, divided by 365). For critical cases in the ICU, about $5000 a day For critical cases in a general hospital bed, about $1200 a day. For critical cases at home, no additional marginal cost. Deaths will cost $5 million.

Now we can make a back of the envelope estimate of the costs and benefits of different courses of infection control.

We’ll impose infection controls on day 130, when there’s 30 dead and 30,500 infections out there. Then we’ll sweep through different strengths of controls and see what the total costs look like over the whole course of the epidemic.

The red line is the cost of the disease itself, largely due to deaths. The green line is the cost of maintaining infection controls throughout the epidemic. The dashed line is the total cost, all in billions of dollars. Note that with no controls, our estimate of the total cost to the US is $14.8 trillion dollars, or about 2/3 of the total US GDP.

There are two local minima on the total cost curve. The first is right around R0=1.50. This makes the spread slow enough that hospital capacity is never overloaded, greatly reducing the number of deaths and thus the cost of the disease. This reduces total cost to $7.45 trillion dollars, just about half of what the uncontrolled cost would be. Note that even with this small amount of reduction, the majority of the cost is due to the controls rather than directly due to the disease.

The second local minima is at the far left, right around at R0=0.35, or about what the Chinese managed in Wuhan. This slows the spread of the disease so fast that we rapidly enter back into the much cheaper contact-tracing regime where wholesale lockdowns are no longer necessary. This reduces total cost to $3.11 trillion dollars, half again the cost with only moderate controls.

The cost savings with very serious controls imposed as soon as possible comes from two sources: First, lower disease burden in deaths, hospitalization, and lost productivity. Secondly, it’s cheaper to impose very serious controls because they need to be imposed for a much shorter period of time to control the disease.

Epidemiological modeling

A basic SIR model for the epidemic.

Total population 327 million, with a single initiating infection. 12 day course of disease, initial R0 of 3.5 (as per the stats from China that Steve linked.) Assume 5% of cases are critical, 2% of those critical cases die with ICU care, 5% with general hospital care, and 50% with no care. 90,000 available ICU beds and 900,000 available hospital beds.

Run this model through to completion and it sweeps through the population in about a year, infecting most everyone (less about 9 million who escape), killing 5.4 million.

Now, suppose we impose infection controls on day 80, right about when there’s 1000 deaths from this thing. And then we vary how strong those controls are: from 0.35 (what the Chinese managed) up to nothing at all.

Here we see how the # of deaths varies with the strength of our controls. If we impose Chinese-style controls, we get away with only 5K deaths, or 1000-fold fewer than without the controls. But the inflection point isn’t just at 1.0. In particular if we can get the R0 below about 1.5 that gets us down under 500K, or a 10-fold reduction. At 1.0 we get down to about 50K.

If we take the slope of this plot, we can see the marginal impact of better controls at each point, in terms of # of deaths saved:

Why the discontinuity at R0 of just below 1.5? Well, here’s the number of critical cases over time for several different scenarios. The vertical line is when we impose controls, the horizontal lines are ICU bed capacity and total hospital bed capacity. Right below an R0 of 1.5, we dip the # of critical cases low enough that we never exceed hospital capacity. The trick here though is that until you get R0 below 1.0 you’re still sweeping through most of the population, just more slowly, slowly enough that the hospitals can handle it. And that means that, for example, if you get R0 to exactly 1.0 you have to keep those infection controls in place for ~5 years or else you’ll be back up close to 5million dead again.

Appendix: code to generate this model.

from __future__ import division
import numpy as np
import matplotlib.pyplot as plt
from collections import namedtuple

PopState = namedtuple("PopState", [
    "day", 
    "susceptible", 
    "infected", 
    "dead", 
    "recovered",
])
EpiParams = namedtuple("EpiParams", [
    "days_infected", 
    "infection_rate", 
    "frac_critical", 
    "frac_dead_icu",
    "frac_dead_hospital",
    "frac_dead_home",
    "start_day",
    "controlled_infection_rate",
])

ICU_BEDS = 90000
HOSPITAL_BEDS = 900000

def update_state(state, params):
    day = state.day
    S = state.susceptible
    I = state.infected
    D = state.dead
    R = state.recovered
    N = S + I + D + R
    
    days_betw_contacts = params.days_infected / params.infection_rate
    if day >= params.start_day:
        days_betw_contacts = params.days_infected / params.controlled_infection_rate
    
    new_infections = (I * S) / (N * days_betw_contacts)
    recovered_infections = (I / params.days_infected) * (1 - params.frac_critical)

    # For critical cases, we split into ICU, hospital, and home care, in descending order
    C = I * params.frac_critical;
    C_icu = min(C, ICU_BEDS);
    C_hospital = 0 if C < ICU_BEDS else min(C - ICU_BEDS, HOSPITAL_BEDS)
    C_home = 0 if C < (ICU_BEDS + HOSPITAL_BEDS) else C - (ICU_BEDS + HOSPITAL_BEDS)

    recovered_critical = (
        ((C_icu / params.days_infected) * (1 - params.frac_dead_icu)) + 
        ((C_hospital / params.days_infected) * (1 - params.frac_dead_hospital)) + 
        ((C_home / params.days_infected) * (1 - params.frac_dead_home))
    )
    
    dead_critical = (
        ((C_icu / params.days_infected) * (params.frac_dead_icu)) + 
        ((C_hospital / params.days_infected) * (params.frac_dead_hospital)) + 
        ((C_home / params.days_infected) * (params.frac_dead_home))
    )
    
    newS = S - new_infections
    newI = I + new_infections - recovered_infections - recovered_critical - dead_critical
    newR = R + recovered_infections + recovered_critical
    newD = D + dead_critical
    
    return PopState(day + 1, newS, newI, newD, newR)

def run_model(state, params):
    infections = [state.infected]
    deaths = [state.dead]
    while (state.infected) >= 1:
        state = update_state(state, params)
        infections.append(state.infected)
        deaths.append(state.dead)
    return infections, deaths, state
	
total_population = 327 * 1e6
init_state = PopState(
    day=1,
    susceptible=total_population - 1, 
    infected=1, 
    dead=0, 
    recovered=0)
params = EpiParams(
    days_infected=12, 
    infection_rate=3.5, 
    frac_critical=0.05, 
    frac_dead_icu=0.02,
    frac_dead_hospital=0.05,
    frac_dead_home=0.5,
    start_day=79,
    # Modify this parameter to vary the strength of infection controls
    controlled_infection_rate=3.5,
)

infections_over_time, deaths_over_time, final_state = run_model(init_state, params)
Create your website with WordPress.com
Get started