You use the \(\chi^2\)-test when you are addressing the inferential aspect of research questions about:

- the relationship between two factor variables (test for association);
- the distribution of one factor variable (goodness-of-fit test).

When your data are in raw form, straight from a data frame, you can perform the test using “formula-data input”. For example, in the `mat111survey`

data, we might wonder whether sex and seating preference are related, in the population from which the sample was (allegedly randomly) drawn. The function call and the output are as follows:

`chisqtestGC(~sex+seat,data=m111survey)`

```
## Pearson's Chi-squared test
##
## Observed Counts:
## seat
## sex 1_front 2_middle 3_back
## female 19 16 5
## male 8 16 7
##
## Counts Expected by Null:
## seat
## sex 1_front 2_middle 3_back
## female 15.21 18.03 6.76
## male 11.79 13.97 5.24
##
## Contributions to the chi-square statistic:
## seat
## sex 1_front 2_middle 3_back
## female 0.94 0.23 0.46
## male 1.22 0.29 0.59
##
##
## Chi-Square Statistic = 3.734
## Degrees of Freedom of the table = 2
## P-Value = 0.1546
```

Sometimes you already have a two-way table on hand:

```
SexSeat <- xtabs(~sex+seat,data=m111survey)
SexSeat
```

```
## seat
## sex 1_front 2_middle 3_back
## female 19 16 5
## male 8 16 7
```

In that case you can save yourself some typing by entering the table in place of the formula and the `data`

arguments:

`chisqtestGC(SexSeat)`

Remember: if you are given summary data, only, then you can construct a nice two-way table and enter it into `chisquaretestGC()`

. Suppose that you want this two-way table:

Front | Middle | Back | |
---|---|---|---|

Female | 19 | 16 | 5 |

Male | 8 | 16 | 7 |

You can get it as follows:

```
MySexSeat <- rbind(female=c(19,16,5),male=c(8,16,7))
colnames(MySexSeat) <- c("front","middle","back")
```

Let’s just check to see that this worked:

`MySexSeat`

```
## front middle back
## female 19 16 5
## male 8 16 7
```

Then you can enter `MySexSeat`

into the function:

`chisqtestGC(MySexSeat)`

When the Null’s expected counts are low, `chisqtestGC()`

delivers a warning and suggests the use of simulation to compute the \(P\)-value. You do this by way of the argument `simulate.p.value`

, and you have three options:

`simulate.p.value = "fixed"`

`simulate.p.value = "random"`

`simulate.p.value = "TRUE"`

Suppose that the objects under study are not a random sample from some larger population, and that the way chance comes into the production of the data is through random variation in all of the other factors—besides the explanatory variable— that might be associated with the response variable. Then since the items being observed are fixed, the tally of values for the explanatory variable are fixed. The response values for these items are the product of chance, but only through random variation in those other factors.

The study from the `ledgejump`

data frame was an example of this. The 21 incidents were fixed, so there were nine cold-weather incidents and 12 warm-weather incidents, no matter what. The crowd behavior at each incident, however is still a matter of chance.

In such a case you might want to resample under the restriction that in all of your resamples, the tally for the explanatory variable stays just the same as it was in the data you observed. Then your function call looks like:

```
chisqtestGC(~weather+crowd.behavior,data=ledgejump,
simulate.p.value="fixed",B=2500)
```

```
## Pearson's chi-squared test with simulated p-value, fixed row sums
## (based on 2500 resamples)
##
## Observed Counts:
## crowd.behavior
## weather baiting polite
## cool 2 7
## warm 8 4
##
## Counts Expected by Null:
## crowd.behavior
## weather baiting polite
## cool 4.29 4.71
## warm 5.71 6.29
##
## Contributions to the chi-square statistic:
## crowd.behavior
## weather baiting polite
## cool 1.22 1.11
## warm 0.91 0.83
##
##
## Chi-Square Statistic = 4.0727
## Degrees of Freedom of the table = 1
## P-Value = 0.05
```

You can set `B`

, the number of resamples, as you wish, but it should be at least a few thousand. Of course the \(P\)-value, having been determined by random resampling, will vary from one run of the function to another.

In the `m111survey`

study on sex and seating preference, the subjects are a random sample from a larger population. In that case the tallies for both the explanatory and the response variables depend upon chance. If you simulate in such a case, then you set `simulate.p.value`

to “random”:

```
chisqtestGC(~sex+seat,data=m111survey,
simulate.p.value="random",B=2500)
```

```
## Pearson's chi-squared test with simulated p-value, marginal sums not fixed
## (based on 2500 resamples)
##
## Observed Counts:
## seat
## sex 1_front 2_middle 3_back
## female 19 16 5
## male 8 16 7
##
## Counts Expected by Null:
## seat
## sex 1_front 2_middle 3_back
## female 15.21 18.03 6.76
## male 11.79 13.97 5.24
##
## Contributions to the chi-square statistic:
## seat
## sex 1_front 2_middle 3_back
## female 0.94 0.23 0.46
## male 1.22 0.29 0.59
##
##
## Chi-Square Statistic = 3.734
## Degrees of Freedom of the table = 2
## P-Value = 0.1687
```

If you want to resample in such a way that the tallies for BOTH the explanatory and response variables stay exactly the same as they were in the actual data, then you set `simulate.p.value`

to “TRUE”. This invokes R’s standard method for resampling:

```
chisqtestGC(~sex+seat,data=m111survey,
simulate.p.value=TRUE,B=2500)
```

```
## Pearson's chi-squared test with simulated p-value
## (based on 2500 resamples)
##
## Observed Counts:
## seat
## sex 1_front 2_middle 3_back
## female 19 16 5
## male 8 16 7
##
## Counts Expected by Null:
## seat
## sex 1_front 2_middle 3_back
## female 15.21 18.03 6.76
## male 11.79 13.97 5.24
##
## Contributions to the chi-square statistic:
## seat
## sex 1_front 2_middle 3_back
## female 0.94 0.23 0.46
## male 1.22 0.29 0.59
##
##
## Chi-Square Statistic = 3.734
## Degrees of Freedom of the table = 2
## P-Value = 0.1704
```

It’s not easy to understand why R would adopt such a method, but there is some good theoretical support for it. If you are ever in doubt about how to simulate, just use this third option.

You can get a graph of the \(P\)-value in the plot window by setting the argument `graph`

to TRUE. When you did not simulate, the graph shows a density curve for the \(\chi^2\) random variable with the relevant degrees of freedom. When you simulate, the graph is a histogram of the resampled \(\chi^2\)-statistics.

Here is a case with no simulation:

`chisqtestGC(~sex+seat,data=m111survey,graph=TRUE)`

```
## Pearson's Chi-squared test
##
## Observed Counts:
## seat
## sex 1_front 2_middle 3_back
## female 19 16 5
## male 8 16 7
##
## Counts Expected by Null:
## seat
## sex 1_front 2_middle 3_back
## female 15.21 18.03 6.76
## male 11.79 13.97 5.24
##
## Contributions to the chi-square statistic:
## seat
## sex 1_front 2_middle 3_back
## female 0.94 0.23 0.46
## male 1.22 0.29 0.59
##
##
## Chi-Square Statistic = 3.734
## Degrees of Freedom of the table = 2
## P-Value = 0.1546
```

Here is a case with simulation:

```
chisqtestGC(~sex+seat,data=m111survey,
simulate.p.value="random",B=2500,graph=TRUE)
```

```
## Pearson's chi-squared test with simulated p-value, marginal sums not fixed
## (based on 2500 resamples)
##
## Observed Counts:
## seat
## sex 1_front 2_middle 3_back
## female 19 16 5
## male 8 16 7
##
## Counts Expected by Null:
## seat
## sex 1_front 2_middle 3_back
## female 15.21 18.03 6.76
## male 11.79 13.97 5.24
##
## Contributions to the chi-square statistic:
## seat
## sex 1_front 2_middle 3_back
## female 0.94 0.23 0.46
## male 1.22 0.29 0.59
##
##
## Chi-Square Statistic = 3.734
## Degrees of Freedom of the table = 2
## P-Value = 0.1575
```

The variable **seat** in the `m111survey`

data frame indicates the classroom seating preference of each person in the survey. Suppose we want to know whether or not the sample data provide strong evidence that seating preference in the Georgetown College population is not uniformly distributed among the three possible options (Front, Middle, and Back). That is, letting

\(p_f =\) proportion of all GC students who prefer the front,

\(p_m =\) proportion of all GC students who prefer the middle,

\(p_b =\) proportion of all GC students who prefer the back,

we want to test the hypotheses:

\(H_0:\) \(p_f = 1/3\) and \(p_m = 1/3\) and \(p_b = 1/3\)

\(H_a:\) at least one of the above proportions is not \(1/3\)

We can do so using the \(\chi^2\)-test for goodness of fit. The argument `p`

will give what the Null Hypothesis believes to be the distribution of the variable **seat** in the Georgetown College population:

```
chisqtestGC(~seat,data=m111survey,
p=c(1/3,1/3,1/3),
graph=TRUE)
```

```
## Chi-squared test for given probabilities
##
## Observed counts Expected by Null Contr to chisq stat
## 1_front 27 23.67 0.47
## 2_middle 32 23.67 2.93
## 3_back 12 23.67 5.75
##
##
## Chi-Square Statistic = 9.1549
## Degrees of Freedom of the table = 2
## P-Value = 0.0103
```

Suppose that the data had only come to us in summary form:

Front | Middle | Back |
---|---|---|

27 | 32 | 12 |

We could still perform the test, by making a table and storing it as an R-object:

```
Seat <- c(front=27,middle=32,back=12)
Seat
```

```
## front middle back
## 27 32 12
```

Then we could perform the test using `Seat`

:

`chisqtestGC(Seat,p=c(1/3,1/3,1/3))`

```
## Chi-squared test for given probabilities
##
## Observed counts Expected by Null Contr to chisq stat
## front 27 23.67 0.47
## middle 32 23.67 2.93
## back 12 23.67 5.75
##
##
## Chi-Square Statistic = 9.1549
## Degrees of Freedom of the table = 2
## P-Value = 0.0103
```

When expected cell counts fall below 5, `chisqtestGC()`

issues a warning and suggests the use of simulation. We can perform simulation at any time, though.

For goodness-of-fit tests, the only relevant form of simulation is the one provided by setting `simulate.p.value`

to `TRUE`

. Of course we also need to set the number `B`

of resamples.

```
set.seed(678910)
chisqtestGC(~seat,data=m111survey,
p=c(1/3,1/3,1/3),
simulate.p.value=TRUE,
B=2500,graph=TRUE)
```

```
## Pearson's chi-squared test with simulated p-value
## (based on 2500 resamples)
##
## Observed counts Expected by Null Contr to chisq stat
## 1_front 27 23.67 0.47
## 2_middle 32 23.67 2.93
## 3_back 12 23.67 5.75
##
##
## Chi-Square Statistic = 9.1549
## Degrees of Freedom of the table = 2
## P-Value = 0.0116
```

If you do not wnat to see quite so much output to the console and are only interested in the essentials for reporting a \(\chi^2\)-test, then set the argument `verbose`

to `FALSE`

:

```
chisqtestGC(~sex+seat,data=m111survey,
verbose=FALSE)
```

```
## Pearson's Chi-squared test
##
## Chi-Square Statistic = 3.734
## Degrees of Freedom of the table = 2
## P-Value = 0.1546
```