On 09-Sep-10 13:21:07, Duncan Murdoch wrote: > On 09/09/2010 6:44 AM, Fernando Marmolejo Ramos wrote: >> Dear all >> >> I wonder if anyone has heard of confidence intervals around >> p-values... > > That doesn't really make sense. p-values are statistics, not > parameters. You would compute a confidence interval around a > population mean because that's a parameter, but you wouldn't > compute a confidence interval around the sample mean: you've > observed it exactly. > > Duncan Murdoch
Duncan has succinctly stated the essential point in the standard interpretation. The P-value is calculated from the sample in hand, a definite null hypothesis, and the distribution of the test statistic given the null hyptohesis, so (given all of these) there is no scope for any other answer. However, there are circumstances in which the notion of "confidence interval for a P-value" makes some sense. One such might be the Mann-Whitney test for identity of distribution of two samples of continuous variables, where (because of discretisation of the values when they were recorded) there are ties. Then you know in theory that the "underlying values" are all different, but because you don't know where these lie in the discretisation intervals you don't know which way a tie may split. So it would make sense to simulate by splitting ties at random (e.g. uniformly distribute each "1.5" value over the interval (1.5,1.6) or (1.45,1.55)). For each such simulated tie-broken sample, calculate the P-value. Then you get a distribution of exact P-values calculated from samples without ties which are consistent with the recorded data. The central 95% of this distribution could be interpreted as a 95% coinfidence interval for the true P-value. To bring this closer to on-topic, here is an example in R (rounding to intervals of 0.2): set.seed(51324) X <- sort(2*round(0.5*rnorm(12),1)) Y <- sort(2*round(0.5*rnorm(12)+0.25,1)) rbind(X,Y) # [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] # X -1.8 -1.2 -0.8 -0.6 0.0 0 0.2 0.2 1.2 1.8 2 2.2 # Y -1.2 -0.4 -0.2 0.4 0.4 1 1.0 1.0 1.2 1.8 2 2.6 # So several ties (-1.2,1.2,1.8,2.0), as well as 0.0, 0.4, 1.0 # which don't matter. wilcox.test(X,Y,alternative="less",exact=TRUE,correct=FALSE) # data: X and Y W = 54, p-value = 0.1488 Ps <- numeric(1000) for(i in (1:1000)){ Xr <- (X-0.1) + 0.2*runif(10) Yr <- (Y-0.1) + 0.2*runif(10) Ps[i] <- wilcox.test(Xr,Yr,alternative="less", exact=TRUE,correct=FALSE)$p.value } hist(Ps) table(round(Ps,4)) # 0.1328 0.1457 0.1593 0.1737 0.1888 # 81 267 336 226 90 So this gives you a picture of the uncertainty in the P-value (0.1488, calculated from the rounded data) relative to what it really should have been (if calculated from unrounded data). Since each possible "true" (tie-broken) sample can be viewed as a hypothesis about unobserved "truth", it does make a certain sense to view these results as a kind of confidence distribution for the P-value you should have got. However, this is more of a Bayesian argument, since the above calculation has assigned equal prior probability to the tie-breaks! One could also, I suppose, consider the question of what distribution of P-values might arise if the/an alternative huypothesis were true, and where in this does the P-value that we actually got lie? But these are murkier waters ... Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <ted.hard...@manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 09-Sep-10 Time: 15:24:29 ------------------------------ XFMail ------------------------------ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.