## The Chi-square Goodness of Fit Test

**Purpose**

To determine if the observed counts of a nominal/categorical variable significantly differ from predicted counts under a null hypothesis. If the observed counts are derived from a categorised continuous variable, these can be compared to counts predicted by a theoretical distribution, e.g., normal.

**Research Question Examples**

*Is a dice fair? That is, did the observed outcomes correspond to a uniform distribution?*

*Can a sample be assumed to have been drawn from a normally distributed population? – Although, you would probably use the Shapiro-Wilk test of normality rather than the Ch-square test to determine this.*

*Can a coin be assumed to be fair given a particular number of heads when tossed 30 times? – Although, you would probably use the Binomial Test rather than Chi-square.*

*Given that the prevalence of adult smoking in the UK is 14.7%, how likely would a prevalence of 20% be in a sample of 100 adults?*

**Requirements**

- Independent random sampling
- Categorical data
- Category expected counts greater than 5

**Background**Tolman et al. (1946) investigating maze learning in rats, wanted to determine if rats would show a preference for a particular choice of route when presented with four alternative routes (A – D). 32 rats were presented with the choice of routes, and the following route choices were observed

Chosen Route

A B C D

Observed 4 5 8 15

Whilst it is evident that there was a preference within the sample for route D, did this represent a preference in the wider population, or was it a consequence of sampling variation?

We will evaluate the null hypothesis that in the population that there is no preference for a particular route, that is the probabilities for selecting a route are the same (0.25) with each route having a predicted count of 8, with the alternative hypothesis being that the probabilities of selecting a route are not equal.

We will use the chisq.test command specifying the counts (4,5,8,15) and their associated probabilities (0.25, 0.25, 0.25, 0.25) assuming the null hypothesis is true. The complete R commands are shown below.

**R Command**Enter the following command (you can copy and paste into R-Studio).

# Lets set-up a vector of values and assign them to the variable observed

observed <- c(4,5,8,15)

# Lets set-up a vector of expected probabilities and assign them to the variable exp.probs

exp.probs <- c(0.25,0.25,0.25,0.25)

# run the test

chisq.test(observed,p=exp.probs)

When executed, this command will generate the output shown below.

**Interpreting The Output & Reporting the Analysis**

**Notes**

- The
*p*-value is 0.01766, which is less than 0.05, so the finding in significant - This means we reject the null hypothesis and accept the alternative hypothesis; that is the population proportions are not equal to 0.25
- When reporting most statistical tests we need to indicate
- the statistic (
*χ*^{2}=9.25) - The sample size (
*N*=32) and/or Degrees of freedom (3) - The
*p*-value

- the statistic (

- The reported
*p*-value is two sided, which is consistent with the non-directional hypothesis. - <- is the assignment operator in R
- In fact, as the probabilities under the null hypotheses were equal, we didn’t need to specify them, as this is the default. Accordingly, the following command would have generated the same output: chisq.test(observed) or chisq.test(c(4,5,8,15))

**References**

Tolman, E. C., Ritchie, B. F., & Kalish, D. (1946). Studies in spatial learning. I. Orientation and the short-cut. *Journal of experimental psychology*, *36*(1), 13.