Chi-squared Test
This lesson covers:
- How chi-squared tests can be used to analyse genetic crosses
- The key steps in the chi-squared test
- How to calculate the chi-squared statistic
- How to compare chi-squared values to critical values
The chi-squared test for genetic crosses
The chi-squared (χ2) test is a statistical tool scientists use to measure any differences between observed experimental results and expected theoretical outcomes. Effectively, it can assess whether the outcomes of a genetic cross are significantly different from the outcomes predicted by a specific inheritance pattern.
Using the χ2 test has certain criteria:
- Large sample size
- Discrete data categories (like yes or no, heads or tails, red or blue)
- Using raw counts (not percentages or rates)
- A comparison of experimental and theoretical results
More observations reduce the relative effect of chance on the difference between expected and observed results.
Overview of the χ2 test
To carry out the χ2 test, there are a few steps that you need to follow.
Steps in the χ2 test:
- Propose an alternative hypothesis, which suggests that there is a significant difference between the observed and expected results, and that the difference is due to a factor other than chance.
- Propose a null hypothesis, which assumes that there is no significant difference between observed and expected results, and that any difference is due to chance alone.
- Predict the expected phenotypic ratios among the offspring.
- Conduct crosses and record the observed ratios.
- Calculate the χ2 statistic.
- Compare the χ2 value to the critical value at a chosen probability level, typically a 5% significance level (p = 0.05).
If the χ2 value is higher than the critical value at the chosen probability level, it suggests that the differences are not due to random chance, leading to the rejection of the null hypothesis.
We accept the null hypothesis when the χ2 value is lower than the critical value at the chosen probability level. This suggests that the differences between the observed and expected frequencies are due to chance.
Calculating the χ2 statistic
To calculate the χ2 statistic, use the formula:
χ2=∑E(O−E)2
Where O is the observed number and E is the expected number for each phenotype.
To calculate the χ2 statistic:
- Calculate the expected values based on the expected phenotypic ratio.
- Record the observed values for each phenotype.
- For each phenotype, subtract the expected number from the observed number.
- Square these differences (to make the values positive).
- Divide each squared difference by the expected number.
- Repeat steps 1-5 for each phenotype and add these values together.
Example of calculating a χ2 statistic
Consider a monohybrid cross examining wing length in pure breeding fruit flies. The phenotypes are normal wings (dominant) and vestigial wings (recessive). This means a 3:1 ratio of normal to vestigial wings can be hypothesised in the F2 generation.
The information you may be provided with is as follows:
- A homozygous dominant parent (NN) is crossed with a homozygous recessive parent (nn).
- This produces 160 offspring in the F2 generation.
- In the F2 generation, there were 111 offspring observed with normal wings, and 49 observed with vestigial wings.
Inputting your calculations into a table may help organise the information.
Phenotype | Expected ratio | Calculation of expected number | Expected number (E) | Observed number (O) | |||
---|---|---|---|---|---|---|---|
Normal wings | 3 | 120 | 111 | -9 | 81 | 0.675 | |
Vestigial wings | 1 | 40 | 49 | 9 | 81 | 2.025 |
We then add together the two values for E(O−E)2:
χ2=∑E(O−E)2
χ2=(0.675+2.025)
χ2=2.7
Comparing the χ2 statistic to the critical value
You then need to determine whether the difference between observed and expected values is statistically significant.
This is done as follows:
- Choose a suitable probability level, typically p = 0.05.
- Calculate the degrees of freedom (df), df = number of phenotypes−1
- Use a χ2 table to find the critical value corresponding to the chosen probability level and degrees of freedom.
- If the χ2 statistic is greater than or equal to the critical value, the difference is significant, and the null hypothesis is rejected.
- If the χ2 statistic is less than the critical value, the difference is likely due to chance, and the null hypothesis can be accepted.
Example of comparing the χ2statistic to the critical value
Let's see this in action with the fruit fly example above.
In this case:
- With two phenotypes, df = 2−1=1
- At the 0.05 probability level, the critical value for 1 df is 3.84 (see the table below; the table will be provided in an exam).
- As the χ2 statistic is 2.7, which is smaller than 3.84, we must accept the null hypothesis.
- There is no significant difference between the observed and expected values, and any difference is due to chance.
df | p = 0.50 | p = 0.10 | p = 0.05 | p = 0.01 | p = 0.001 |
---|---|---|---|---|---|
1 | 0.46 | 2.71 | 3.84 | 6.63 | 10.83 |
2 | 1.39 | 4.60 | 5.99 | 9.21 | 13.82 |
3 | 2.37 | 6.25 | 7.81 | 11.35 | 16.27 |
Worked example - Calculating the χ2 statistic for a pea plant cross
A pea plant cross where the expected ratio of tall to dwarf plants is 3:1 produced 160 offspring, where 135 were tall plants and 25 were dwarf plants.
Calculate the χ2 statistic for this cross, and determine whether any difference between observed and expected values is due to chance or some other factor.
Use the following formula:
χ2=∑E(O−E)2
Where O is the observed number and E is the expected number for each phenotype.
Step 1: Calculate expected numbers
expected tall plants: 160×43=120
expected dwarf plants: 160×41=40
Step 2: Subtract expected values from observed values for each phenotype
tall plants: 135−120=15
dwarf plants: 25−40=−15
Step 3: Square the differences
tall plants: 152=225
dwarf plants: (−15)2=225
Step 4: Divide the squared difference by the expected number
tall plants: 120225=1.875
dwarf plants: 40225=5.625
Step 5: Substitution and correct evaluation
χ2=1.875+5.625=7.5
Step 6: Compare to the critical value at a 5% probability level
this determines whether the observed ratio significantly deviates from the expected 3:1 ratio
as there are 2 phenotypes, df = 2−1=1
df | p = 0.50 | p = 0.10 | p = 0.05 | p = 0.01 | p = 0.001 |
---|---|---|---|---|---|
1 | 0.46 | 2.71 | 3.84 | 6.63 | 10.83 |
2 | 1.39 | 4.60 | 5.99 | 9.21 | 13.82 |
3 | 2.37 | 6.25 | 7.81 | 11.35 | 16.27 |
the χ2 statistic for the pea plant cross is 7.5, which is higher than the critical value at p = 0.05 (3.84)