Hardy Weinberg Model for Gene Frequencies*
An Application of the Square of a Binomial
In learning algebraic manipulations, one of the early techniques practiced is multiplication of binomials
(a+b)(c+d)
by successive application of the distributive property of multiplication over addition, known fondly as FOIL (First, Outside, Inside, Last), often used as a verb. The special case of a square of a binomial
(a+b)=a+2ab+b
is a pattern that is frequently stressed in early instruction, and gets special recognition later in completing the square.
Here is an application in genetics, looking at the particular frequencies of mutant alleles and the related frequencies of allele pairings of homozygotes with the normal allele only, homozygotes with the mutant allele only, that display a trait, and heterozygotes (mixed allele pairs) that are carriers of the trait if it is recessive (as with the sickle cell gene), and displaying the trait if it is dominant (as with blood antibody types). The square of the binomial appears naturally in this description.
The normal allele is denoted A; the fraction of alleles that are normal in a population is denoted p.
The mutant allele is denoted a; the fraction of alleles that are mutant in a population is denoted q.
The entirety of the population of alleles is the sum of both fractions: p+q=1.
An individual is composed of a
maternal allele and a paternal allele.

The population fraction of individuals that are homozygous AA is p;
the population fraction of individuals that are heterozygous Aa is pq;
the population fraction of individuals that are homozygous aa is q.
The sum of all these fractions, the entirety of the population, is
1=p+2pq+q=(p+q) =1=1
Let us see this applied.
A screening of school age
children in
|
Genotype |
AA |
Aa |
aa |
|
Frequency |
.834 |
.161 |
.005 |
What is the frequency, q, of the sickle cell allele in this population?
Well, these are REAL numbers, so the “answers” don’t come out looking perfect. Seeing from the above that q=.005, it would be natural to conclude that q=Ö(.005)=.071. This would give a value for p=1-q=.929. Unfortunately, this is inconsistent with our p value of .834, because (.929) =.863. And the value of 2pq from our calculated values would be .132, not the .161 value given. Part of what has happened here is the dearth of significant digits in our starting figure of .005.
Let us try
using the other two values; the frequencies for genotypes AA and Aa have three
significant digits. Using the p=.834, we could calculate p=Ö(.834)=.913.
This would yield q=1-p=.087. But
these p and q values are not consistent with our table values either: 2pq=.158, and q=.008. Alternatively, we can
note that
q=q(1)=q(p+q)=pq+q=½ Frequency (Aa)+Frequency (aa)
so we can compute the frequency, q, as
q=½(.161)+.005= .0805+.005».086.
The q here would be .007, and would
have p=1-q=1-.086=.914; the associated p=.835
If we try using both of the three sigfig frequencies,
then note that
p=p(1)=p(p+q)=p+pq=Frequency(AA)+½ Frequency (Aa).
Thus we can compute the frequency, p, of the normal
allele in the population:
p=.834+½(.161)=.834+.0805».915.
The p here comes up .837. The frequency, q, is then q=1-p=1-.915=.085. This gives q=.007.
Let’s summarize all this in a table.
AA Aa aa
|
actual frequencies for
genotypes® |
.834 |
.161 |
.005 |
|
|
p |
q |
p |
2pq |
q |
|
.929 |
.071 |
.863 |
.131 |
.005 |
|
.913 |
.087 |
.834 |
.158 |
.008 |
|
.914 |
.086 |
.835 |
.157 |
.007 |
|
.915 |
.085 |
.837 |
.156 |
.007 |
Well you pick. I’d go with p=.92 and q=.08. (The text I swiped this from uses the last.)
Now let’s compare this with
A screening of young
adults in
Other genetic disorders that are the result of the presence of two recessive alleles can be analyzed in a similar fashion. Estimate the proportion of carriers (heterozygotes) in each population:
·
Cystic fibrosis, which occurs in approximately
one out of every 1600 births in the
·
In the Jewish populations with origins in
northern
·
Albinism among the Indians on the San Blas
Islands of
In the ABO blood type classifications, let p,q, and r denote the frequencies of A,B and O alleles respectively. p+q+r=1. An OO genotype will present with blood type O; an AA or AO genotype will present with blood type A; a BB or BO genotype will present with blood type B; and an AB genotype will present with blood type AB.
Use a tree diagram to show that the fractions in the population that present with various blood types are
|
Blood Type |
O |
A |
B |
AB |
|
Frequency |
r |
p+2pr |
q+2qr |
2pq |
and admire how the sum of all these frequencies is the square of the trinomial (p+q+r)!
If a study of the blood types
in
|
Blood Type |
O |
A |
B |
AB |
|
Frequency |
.441 |
.435 |
.090 |
.034 |