Suppose we have a dominant mutation in the population with some prevalence . That means that a fraction of the alleles at that locus in the population are our mutation, while the rest are wild-type. We’d like to know how that prevalence changes over time. Some of those changes will be random, but we can average out to ask what the expectation of the change is at any particular point. We’ll have to make some assumptions first:
- Generations will be single events. We’ll start off with some generation, apply a selection model, and use that to generate the next generation, ad infinitum.
- Individuals will randomly mate to produce the next generation. This means there won’t be any population substructure and we can calculate the fraction of homozygotes and heterozygotes using only the mutation prevalence .
- The mutation confers a selective advantage that is equal in both homozygote and heterozygote carriers.
Given this, we can work out a table comparing different parts of the population, where is the probability of the wild-type allele:
|Fraction of current generation|
Relative fitness here means that parents with the allele will have on average children if wild type parents have . That lets us calculate the fraction of the parents of the next generation, adjusting for increased reproduction of the carriers.
|Relative proportion of parents|
|Normalized fraction of parents|
Now that we know what fraction of the parental population has each status, we can calculate the expected prevalence of the allele after one generation. The next generation will have the same prevalence as in the parental population.
That is, we sum up half the fraction of heterozygotes and the fraction of homozygotes. This simplifies to
Note that for the case where , this reduces to , indicating no expected change in allele prevalence when there is no selective pressure, as we would expect. We can then calculate the change in prevalence as , which simplifies down to
This change is zero if and only if is zero. That is, if any of , , or are zero. This means that the allele changes frequency unless there is no selective pressure or the frequency is fixed at 0 or 1.
We can get a better handle on the behavior of this by treating it as a differential equation and solving. We can rewrite it as
Then we integrate both sides with respect to t:
Simplify the left side by partial fraction decomposition and solve the right side:
Then we simply integrate each piece of the left integral separately and combine the constants of integration.
There’s no way to solve explicitly for at any particular number of generations. Instead what this gives us is a solution for , the number of generations it takes to reach any particular prevalence, where the constant of integration is provided by the initial prevalence of the allele.
This lets us answer the question of how changes in the selection pressure affects the speed of selection. Suppose we keep the same starting prevalence and we wish to know how fast the population will reach some ending prevalence. The time needed is of the form
If then the time becomes approximately independent of . If then the time becomes approximately inversely proportional to the selection pressure. This gives us a bound on how fast increased selection can work: at most, doubling the selection pressure will half the time needed to increase allele prevalence to a given point.