Re: [R] confidence intervals around p-values

JLucke Thu, 09 Sep 2010 10:02:14 -0700

A confidence interval around the p-value makes no sense because there is 
no parameter being estimated, but the sampling distribution of the p-value 
makes a lot of sense.  The pre-observational P-value is a random variable 
that is a function of the underlying random variable being tested.  That 
is, P_X(t) = Pr(X>t) is itself a random variable with density, 
distribution, and moments.  Thus, one can compute the 95% sampling 
distribution around the expectation of P.


See 

Hung, H. M. J.; O'Neill, R. T.; Bauer, P. & Kohne, K. The behavior of the 
P-value when the alternative hypothesis is true Biometrics, 1997, 53, 1-22

Donahue, R. M. J. A note on information seldom reported via the p value. 
The American Statistician, American Statistical Association, 1999, 53, 
303-306 

 





Greg Snow <greg.s...@imail.org> 
Sent by: r-help-boun...@r-project.org
09/09/2010 12:29 PM

To
"ted.hard...@manchester.ac.uk" <ted.hard...@manchester.ac.uk>, 
"r-help@r-project.org" <r-help@r-project.org>
cc
Fernando Marmolejo Ramos <fernando.marmolejora...@adelaide.edu.au>
Subject
Re: [R] confidence intervals around p-values






One other case where a confidence interval on a p-value may make sense is 
permutation (or other resampling) tests.  The population parameter p-value 
would be the p-value that would be obtained from the distribution of all 
possible permutations, but in practice we just sample from that population 
and estimate a p-value.  The confidence interval would then be based on 
the number of sample permutations and could give an idea if that number 
was big enough.  If the full confidence interval is less than alpha then 
you can be confident that the "true" p-value would give significance, if 
it is completely above alpha then it is not significant.  The real problem 
comes when the confidence interval includes alpha, that would indicate 
that B (the number of resamples/permutations) was not large enough.  Be 
careful, doing a small number of permutations then deciding to do more 
based on the CI would likely introduce bias (how much is another 
question).

The nice thing is that in this case the p-value is a simple proportion and 
the confidence interval can be computed using binom.test.

But, I fully agree that in most cases the idea of a CI for a p-value is 
not meaningful, you need to have some case where your p-value is an 
estimate of a "population parameter p-value" that has some meaning.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -----Original Message-----
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of Ted Harding
> Sent: Thursday, September 09, 2010 8:25 AM
> To: r-help@r-project.org
> Cc: Fernando Marmolejo Ramos
> Subject: Re: [R] confidence intervals around p-values
> 
> On 09-Sep-10 13:21:07, Duncan Murdoch wrote:
> >   On 09/09/2010 6:44 AM, Fernando Marmolejo Ramos wrote:
> >> Dear all
> >>
> >> I wonder if anyone has heard of confidence intervals around
> >> p-values...
> >
> > That doesn't really make sense.  p-values are statistics, not
> > parameters. You would compute a confidence interval around a
> > population mean because that's a parameter, but you wouldn't
> > compute a confidence interval around the sample mean: you've
> > observed it exactly.
> >
> > Duncan Murdoch
> 
> Duncan has succinctly stated the essential point in the standard
> interpretation. The P-value is calculated from the sample in
> hand, a definite null hypothesis, and the distribution of the
> test statistic given the null hyptohesis, so (given all of these)
> there is no scope for any other answer.
> 
> However, there are circumstances in which the notion of "confidence
> interval for a P-value" makes some sense. One such might be the
> Mann-Whitney test for identity of distribution of two samples
> of continuous variables, where (because of discretisation of the
> values when they were recorded) there are ties.
> 
> Then you know in theory that the "underlying values" are all
> different, but because you don't know where these lie in the
> discretisation intervals you don't know which way a tie may
> split. So it would make sense to simulate by splitting ties
> at random (e.g. uniformly distribute each "1.5" value over the
> interval (1.5,1.6) or (1.45,1.55)).
> 
> For each such simulated tie-broken sample, calculate the P-value.
> Then you get a distribution of exact P-values calculated from
> samples without ties which are consistent with the recorded data.
> The central 95% of this distribution could be interpreted as a 95%
> coinfidence interval for the true P-value.
> 
> To bring this closer to on-topic, here is an example in R
> (rounding to intervals of 0.2):
> 
>   set.seed(51324)
>   X <- sort(2*round(0.5*rnorm(12),1))
>   Y <- sort(2*round(0.5*rnorm(12)+0.25,1))
>   rbind(X,Y)
> #   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
> # X -1.8 -1.2 -0.8 -0.6  0.0    0  0.2  0.2  1.2   1.8     2   2.2
> # Y -1.2 -0.4 -0.2  0.4  0.4    1  1.0  1.0  1.2   1.8     2   2.6
> # So several ties (-1.2,1.2,1.8,2.0), as well as 0.0, 0.4, 1.0
> # which don't matter.
> wilcox.test(X,Y,alternative="less",exact=TRUE,correct=FALSE)
> # data:  X and Y   W = 54, p-value = 0.1488
> 
>   Ps <- numeric(1000)
>   for(i in (1:1000)){
>     Xr <- (X-0.1) + 0.2*runif(10)
>     Yr <- (Y-0.1) + 0.2*runif(10)
>     Ps[i] <- wilcox.test(Xr,Yr,alternative="less",
>              exact=TRUE,correct=FALSE)$p.value
>   }
>   hist(Ps)
>   table(round(Ps,4))
>   # 0.1328 0.1457 0.1593 0.1737 0.1888
>   #     81    267    336    226     90
> 
> So this gives you a picture of the uncertainty in the P-value
> (0.1488, calculated from the rounded data) relative to what it
> really should have been (if calculated from unrounded data).
> Since each possible "true" (tie-broken) sample can be viewed
> as a hypothesis about unobserved "truth", it does make a certain
> sense to view these results as a kind of confidence distribution
> for the P-value you should have got. However, this is more of a
> Bayesian argument, since the above calculation has assigned
> equal prior probability to the tie-breaks!
> 
> One could also, I suppose, consider the question of what
> distribution of P-values might arise if the/an alternative
> huypothesis were true, and where in this does the P-value that
> we actually got lie? But these are murkier waters ...
> 
> Ted.
> 
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <ted.hard...@manchester.ac.uk>
> Fax-to-email: +44 (0)870 094 0861
> Date: 09-Sep-10                                       Time: 15:24:29
> ------------------------------ XFMail ------------------------------
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] confidence intervals around p-values

Reply via email to