Selection pressure on prevalence of a dominant mutation

Suppose we have a dominant mutation in the population with some prevalence p. That means that a fraction p of the alleles at that locus in the population are our mutation, while the rest are wild-type. We’d like to know how that prevalence changes over time. Some of those changes will be random, but we can average out to ask what the expectation of the change is at any particular point. We’ll have to make some assumptions first:

  • Generations will be single events. We’ll start off with some generation, apply a selection model, and use that to generate the next generation, ad infinitum.
  • Individuals will randomly mate to produce the next generation. This means there won’t be any population substructure and we can calculate the fraction of homozygotes and heterozygotes using only the mutation prevalence p.
  • The mutation confers a selective advantage s that is equal in both homozygote and heterozygote carriers.

Given this, we can work out a table comparing different parts of the population, where q = 1 - p is the probability of the wild-type allele:

Wild-typeHeterozygoteHomozygote
Fraction of current generationq^22pqp^2
Relative fitness11 + s1 + s

Relative fitness here means that parents with the allele will have on average 1 + s children if wild type parents have 1. That lets us calculate the fraction of the parents of the next generation, adjusting for increased reproduction of the carriers.

Wild-typeHeterozygoteHomozygote
Relative proportion of parentsq^22pq (1 + s)p^2 (1 + s)
Normalized fraction of parents\frac{q^2}{1 + sp + spq}\frac{2pq(1+s)}{1 + sp + spq}\frac{p^2(1+s)}{1 + sp + spq}

Now that we know what fraction of the parental population has each status, we can calculate the expected prevalence of the allele after one generation. The next generation will have the same prevalence as in the parental population.

p' = \frac{pq(1+s) + p^2(1+s)}{1 + sp + spq}.

That is, we sum up half the fraction of heterozygotes and the fraction of homozygotes. This simplifies to

p' = \frac{p (1 + s)}{1 + sp + spq}.

Note that for the case where s=0, this reduces to p' = p, indicating no expected change in allele prevalence when there is no selective pressure, as we would expect. We can then calculate the change in prevalence as \Delta p = p' - p, which simplifies down to

\Delta p = \frac{spq^2}{1 + sp + spq}.

This change is zero if and only if spq^2 is zero. That is, if any of s, p, or q are zero. This means that the allele changes frequency unless there is no selective pressure or the frequency is fixed at 0 or 1.

We can get a better handle on the behavior of this by treating it as a differential equation and solving. We can rewrite it as

\frac{dp}{dt} \frac{-p^2 + 2p + \frac{1}{s}}{p^3 - 2p^2 + s} = 1.

Then we integrate both sides with respect to t:

\int \frac{-p^2 + 2p + \frac{1}{s}}{p^3 - 2p^2 + s} \frac{dp}{dt} dt = \int 1 dt.

Simplify the left side by partial fraction decomposition and solve the right side:

\int \left [\frac{1}{sp} + \frac{1}{1-p} + \frac{1}{s(1-p)} + \frac{1}{(1-p)^2} + \frac{1}{s(1-p)^2} \right] dp = t + c.

Then we simply integrate each piece of the left integral separately and combine the constants of integration.

\frac{1 + s}{sq} + \frac{1}{s}ln(\frac{p}{q}) - ln(q) = t + c.

There’s no way to solve explicitly for p at any particular number of generations. Instead what this gives us is a solution for t, the number of generations it takes to reach any particular prevalence, where the constant of integration is provided by the initial prevalence of the allele.

t(p | s) = \frac{1+s}{s} \left ( \frac{q_0 - q}{q_0q} - ln \frac{q}{q_0} \right ) + \frac{1}{s} ln \frac{p}{p_0}.

This lets us answer the question of how changes in the selection pressure affects the speed of selection. Suppose we keep the same starting prevalence and we wish to know how fast the population will reach some ending prevalence. The time needed is of the form

t(p | s) = \frac{1}{s}(s K(p) + Q(p))

K(p) = \frac{q_0 - q}{q_0q} - ln \frac{q}{q_0}

Q(p) = \frac{q_0 - q}{q_0q} - ln \frac{q}{q_0} + ln\frac{p}{p_0}

If K(p) >> Q(p) then the time becomes approximately independent of s. If K(p) << Q(p) then the time becomes approximately inversely proportional to the selection pressure. This gives us a bound on how fast increased selection can work: at most, doubling the selection pressure will half the time needed to increase allele prevalence to a given point.

Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create your website at WordPress.com
Get started
%d bloggers like this: