A confidence interval around the p-value makes no sense because there is no parameter being estimated, but the sampling distribution of the p-value makes a lot of sense. The pre-observational P-value is a random variable that is a function of the underlying random variable being tested. That is, P_X(t) = Pr(X>t) is itself a random variable with density, distribution, and moments. Thus, one can compute the 95% sampling distribution around the expectation of P.
See Hung, H. M. J.; O'Neill, R. T.; Bauer, P. & Kohne, K. The behavior of the P-value when the alternative hypothesis is true Biometrics, 1997, 53, 1-22 Donahue, R. M. J. A note on information seldom reported via the p value. The American Statistician, American Statistical Association, 1999, 53, 303-306 Greg Snow <greg.s...@imail.org> Sent by: r-help-boun...@r-project.org 09/09/2010 12:29 PM To "ted.hard...@manchester.ac.uk" <ted.hard...@manchester.ac.uk>, "r-help@r-project.org" <r-help@r-project.org> cc Fernando Marmolejo Ramos <fernando.marmolejora...@adelaide.edu.au> Subject Re: [R] confidence intervals around p-values One other case where a confidence interval on a p-value may make sense is permutation (or other resampling) tests. The population parameter p-value would be the p-value that would be obtained from the distribution of all possible permutations, but in practice we just sample from that population and estimate a p-value. The confidence interval would then be based on the number of sample permutations and could give an idea if that number was big enough. If the full confidence interval is less than alpha then you can be confident that the "true" p-value would give significance, if it is completely above alpha then it is not significant. The real problem comes when the confidence interval includes alpha, that would indicate that B (the number of resamples/permutations) was not large enough. Be careful, doing a small number of permutations then deciding to do more based on the CI would likely introduce bias (how much is another question). The nice thing is that in this case the p-value is a simple proportion and the confidence interval can be computed using binom.test. But, I fully agree that in most cases the idea of a CI for a p-value is not meaningful, you need to have some case where your p-value is an estimate of a "population parameter p-value" that has some meaning. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- > project.org] On Behalf Of Ted Harding > Sent: Thursday, September 09, 2010 8:25 AM > To: r-help@r-project.org > Cc: Fernando Marmolejo Ramos > Subject: Re: [R] confidence intervals around p-values > > On 09-Sep-10 13:21:07, Duncan Murdoch wrote: > > On 09/09/2010 6:44 AM, Fernando Marmolejo Ramos wrote: > >> Dear all > >> > >> I wonder if anyone has heard of confidence intervals around > >> p-values... > > > > That doesn't really make sense. p-values are statistics, not > > parameters. You would compute a confidence interval around a > > population mean because that's a parameter, but you wouldn't > > compute a confidence interval around the sample mean: you've > > observed it exactly. > > > > Duncan Murdoch > > Duncan has succinctly stated the essential point in the standard > interpretation. The P-value is calculated from the sample in > hand, a definite null hypothesis, and the distribution of the > test statistic given the null hyptohesis, so (given all of these) > there is no scope for any other answer. > > However, there are circumstances in which the notion of "confidence > interval for a P-value" makes some sense. One such might be the > Mann-Whitney test for identity of distribution of two samples > of continuous variables, where (because of discretisation of the > values when they were recorded) there are ties. > > Then you know in theory that the "underlying values" are all > different, but because you don't know where these lie in the > discretisation intervals you don't know which way a tie may > split. So it would make sense to simulate by splitting ties > at random (e.g. uniformly distribute each "1.5" value over the > interval (1.5,1.6) or (1.45,1.55)). > > For each such simulated tie-broken sample, calculate the P-value. > Then you get a distribution of exact P-values calculated from > samples without ties which are consistent with the recorded data. > The central 95% of this distribution could be interpreted as a 95% > coinfidence interval for the true P-value. > > To bring this closer to on-topic, here is an example in R > (rounding to intervals of 0.2): > > set.seed(51324) > X <- sort(2*round(0.5*rnorm(12),1)) > Y <- sort(2*round(0.5*rnorm(12)+0.25,1)) > rbind(X,Y) > # [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] > # X -1.8 -1.2 -0.8 -0.6 0.0 0 0.2 0.2 1.2 1.8 2 2.2 > # Y -1.2 -0.4 -0.2 0.4 0.4 1 1.0 1.0 1.2 1.8 2 2.6 > # So several ties (-1.2,1.2,1.8,2.0), as well as 0.0, 0.4, 1.0 > # which don't matter. > wilcox.test(X,Y,alternative="less",exact=TRUE,correct=FALSE) > # data: X and Y W = 54, p-value = 0.1488 > > Ps <- numeric(1000) > for(i in (1:1000)){ > Xr <- (X-0.1) + 0.2*runif(10) > Yr <- (Y-0.1) + 0.2*runif(10) > Ps[i] <- wilcox.test(Xr,Yr,alternative="less", > exact=TRUE,correct=FALSE)$p.value > } > hist(Ps) > table(round(Ps,4)) > # 0.1328 0.1457 0.1593 0.1737 0.1888 > # 81 267 336 226 90 > > So this gives you a picture of the uncertainty in the P-value > (0.1488, calculated from the rounded data) relative to what it > really should have been (if calculated from unrounded data). > Since each possible "true" (tie-broken) sample can be viewed > as a hypothesis about unobserved "truth", it does make a certain > sense to view these results as a kind of confidence distribution > for the P-value you should have got. However, this is more of a > Bayesian argument, since the above calculation has assigned > equal prior probability to the tie-breaks! > > One could also, I suppose, consider the question of what > distribution of P-values might arise if the/an alternative > huypothesis were true, and where in this does the P-value that > we actually got lie? But these are murkier waters ... > > Ted. > > -------------------------------------------------------------------- > E-Mail: (Ted Harding) <ted.hard...@manchester.ac.uk> > Fax-to-email: +44 (0)870 094 0861 > Date: 09-Sep-10 Time: 15:24:29 > ------------------------------ XFMail ------------------------------ > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.