# Variant fixation probability

## Introduction

What happens to a new mutation?

Every person has some new mutations in their somatic genome, mostly caused by errors in DNA replication. They can pass that mutation on to their descendants. We know of a couple of historical mutations that look like they are all from a single common ancestor – for example, the common variant that’s responsible for blue eyes. How likely is a new mutation to succeed and become common?

This depends on the selective advantage $s$ of the new mutation. Relative to non-carriers, how many more surviving children do carriers of the mutation have than others? If non-carriers have on average $1$ child, then carriers will have $1 + s$ children. Intuitively, higher selective advantage should make it more likely for a mutation to succeed.

We’re most interested in humans right now, and humans are diploid; they have two copies of each chromosome. Any variant can either be on both chromosomes or only one, and the selective advantage can be different depending on this. There are many common variants that are helpful when only one copy is present and harmful when two copies are present. Whether a mutation succeeds and becomes common mostly depends on the selective advantage in carriers with a single copy, because carriers with two copies are very rare when the mutation is uncommon.

There have been multiple approaches to solving this problem. Today we’ll look at one of the earliest and most simple. It’s straightforward, but requires a couple of assumptions that are not always valid. But we can get a lot of utility out of the simple result.

## Haldane’s approach

Haldane’s original 1927 paper has a derivation of the likelihood of a new mutation becoming common, but it can be somewhat difficult to understand. We’ll work out his results in slightly more detail.

Suppose we have a large population with a new mutation present in a single individual. Let’s let $p_r$ be the chance that this mutation is passed down to $r$ of its children in the next generation. We’ll define a new function

$f(x) = \sum_{r=0}^{\infty} p_r x^r$

We’ll see why this function is useful in more detail later. By construction, the coefficient of $x^r$ in $f(x)$ is the probability of $r$ carriers in the next generation. Suppose instead we have multiple carriers in this generation. If the fraction of carriers is small, then we can ignore potential matings between carriers. That means that with $m$ carriers, the probability of $r$ carriers in the next generation is the coefficient of $x^r$ in $[f(x)]^m$.

Expanding on this a bit, suppose that $p_0=0.5$ and $p_1=0.5$, so that each carrier has an equal probability of zero and one children who are also carriers. Then

$f(x) = 0.5 + 0.5x$

If we have two carriers,

$[f(x)]^m = 0.25 + 0.5x + 0.25x^2$

And we can see that the coefficients of the powers of x compose the probability distribution of carriers in the next generation.

Now, what would happen if instead of multiplying the function, we iteratively apply it to itself? This forms a composite function

$S_f^N(x) = f( f( f( ... f(x) ... )))$

Here the superscript $N$ denotes the number of times the function has been applied. Again, let’s look at this for our simple example. Applied twice,

$S_f^2(x) = f(f(x)) = 0.5 + 0.5 (0.5 + 0.5x) = 0.75 + 0.25x$

We can see that this function $S_f^N(x)$ represents the probability distribution of carriers after $N$ generations of reproduction. In this case, after each generation, half the carriers produce no children with carriers, so after two generations, we have only a 0.25 chance of a single carrier.

We can then represent the likelihood that the variant disappears from the population as $\lim_{N \to \infty} S_f^N(0)$. That is, as the number of generations goes up, what is the probability of no carriers in the population, represented as the coefficient of $x^0$?

This kind of repeated application of a function to an input is called a fixed point, and it will converge to a root of the equation $f(x) = x$ if

1. $f(x)$ is continuously differentiable around the fixed point.
2. $|f'(x_0)| < 1$.

Now, to be able to go further, we have to make some assumptions about the form of the function $f(x)$. We want to model the case of a new mutation with selective advantage $s$, so that the expected number of carrier children of carriers with this allele is $1 + s$. We will model the probabilities $p_r$ as sampling from a Poisson distribution with mean $1+s$, giving:

$f(x) = \sum_{r=0}^{\infty} \frac{1}{r!} x^r (1+s)^r e^{-(1+s)}$

This can be simplified with some algebraic work I won’t show here to

$f(x) = e^{(x - 1)(1 + s)}$

So the probability of extinction is $1-y$, where $y$ is the probability of survival, and we know that

$1 - y = \lim_{N \to \infty} S_f^N(0)$

So let’s plug $x=1-y$ into the fixed point equation $x = f(x)$, giving us

$1 - y = e^{-y(1 + s)}$

We take the natural log of both sides to get

$y(1 - s) = -ln(1 - y)$.

Then we can take the Taylor series of $-ln(1-y)$ to give us

$y(1 - s) = y + \frac{y^2}{2} + \frac{y^3}{3} + ...$

Or simplify to give

$s = \frac{y}{2} + \frac{y^2}{3} + ...$.

Now, if the selective advantage $s$ is small, we can drop higher order terms, and we get Haldane’s result: the probability that the mutation survives from a single starting copy is $y=2s$.

## Implications

First let’s remember the assumptions that went into this calculation:

• $s$ is “small”, so that we can drop higher-order terms in the Taylor series.
• The population size is “big”, so that we can ignore the likelihood of carrier-carrier matings.
• The number of surviving children per individual is Poisson-distributed.

The biggest implication of this result is that the likelihood of the mutation surviving doesn’t depend on the size of the population! It’s just as easy for a mutation to become common in a small population as a large population.

Second, the likelihood of survival is directly proportional to the selective advantage of the mutation. But since even highly adaptive mutations have low absolute values of $s$, most new adaptive mutations are eventually lost.

Finally, if you repeatedly introduce the same beneficial mutation into a population, it’s very likely to become common. This suggests that even small amounts of crossbreeding will result in beneficial mutations from one population becoming common in the other population.