<constant <at> unb.br> writes: > For many tables, chisq.test with simulate.p.value=TRUE gives a p value that is > obviously incorrect and inversely proportional to the number of replicates: > > > data(HairEyeColor) > > x <- margin.table(HairEyeColor, c(1, 2)) > > chisq.test(x,simulate.p.value=TRUE,B=2000) > Pearson's Chi-squared test with simulated p-value (based on 2000 > replicates) > data: x > X-squared = 138.2898, df = NA, p-value = 0.0004998 > > > chisq.test(x,simulate.p.value=TRUE,B=10000) > X-squared = 138.2898, df = NA, p-value = 1e-04 > > > chisq.test(x,simulate.p.value=TRUE,B=100000) > X-squared = 138.2898, df = NA, p-value = 1e-05 > > > chisq.test(x,simulate.p.value=TRUE,B=1000000) > X-squared = 138.2898, df = NA, p-value = 1e-06 > ... >
Tried to answer this the other day but the answer must have gotten lost. The standard analytical chi-squared test here gives p<2.2e-16 (i.e. very very small). The values given above, up to limited display of significant digits, are precisely 1/(B+1); that is, the simulated chi-squared values are never less than the observed chi-squared statistic (the observed value itself is included in the ensemble, so the p-value is given as 1/(B+1) rather that <1/B; you can read about the reasons for this elsewhere [?]). Bottom line: why do you think these results are "obviously incorrect"? Ben Bolker ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel