<constant <at> unb.br> writes: > > Full_Name: Reginaldo Constantino > Version: 2.8.0 > OS: Ubuntu Hardy (32 bit, kernel 2.6.24) > Submission from: (NULL) (189.61.88.2) > > For many tables, chisq.test with simulate.p.value=TRUE gives a p value that is > obviously incorrect and inversely proportional to the number of replicates: > > > data(HairEyeColor) > > x <- margin.table(HairEyeColor, c(1, 2)) > > chisq.test(x,simulate.p.value=TRUE,B=2000) > Pearson's Chi-squared test with simulated p-value (based on 2000 > replicates) > data: x > X-squared = 138.2898, df = NA, p-value = 0.0004998 > > > chisq.test(x,simulate.p.value=TRUE,B=10000) > X-squared = 138.2898, df = NA, p-value = 1e-04 > > > chisq.test(x,simulate.p.value=TRUE,B=100000) > X-squared = 138.2898, df = NA, p-value = 1e-05 > > > chisq.test(x,simulate.p.value=TRUE,B=1000000) > X-squared = 138.2898, df = NA, p-value = 1e-06 > ... > > Also tested the same R version under Windows XP and got the same results. >
Could you explain why this is wrong? The data are extremely unlikely under the null hypothesis (the standard chisq.test() gives p<2.2e-16), so the result of the simulation protocol is always 1/(B+1); that is, as is standard with these protocols, the observed value is added to the ensemble of simulations. Why is the p value "obviously incorrect"? cheers Ben Bolker ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel