Re: [R] p-values from bootstrap - what am I not understanding?

Peter Dalgaard Sun, 12 Apr 2009 15:41:52 -0700

Johan Jackson wrote:

Dear stats experts:
Me and my little brain must be missing something regarding bootstrapping. I
understand how to get a 95%CI and how to hypothesis test using bootstrapping
(e.g., reject or not the null). However, I'd also like to get a p-value from
it, and to me this seems simple, but it seems no-one does what I would like
to do to get a p-value, which suggests I'm not understanding something.
Rather, it seems that when people want a p-value using resampling methods,
they immediately jump to permutation testing (e.g., destroying dependencies
so as to create a null distribution). SO - here's my thought on getting a
p-value by bootstrapping. Could someone tell me what is wrong with my
approach? Thanks:


STEPS TO GETTING P-VALUES FROM BOOTSTRAPPING - PROBABLY WRONG:

1) sample B times with replacement, figure out theta* (your statistic of
interest). B is large (> 1000)

2) get the distribution of theta*

3) the mean of theta* is generally near your observed theta. In the same way
that we use non-centrality parameters in other situations, move the
distribution of theta* such that the distribution is centered around the
value corresponding to your null hypothesis (e.g., make the distribution
have a mean theta = 0)

4) Two methods for finding 2-tailed p-values (assuming here that your
observed theta is above the null value):
Method 1: find the percent of recentered theta*'s that are above your
observed theta. p-value = 2 * this percent
Method 2: find the percent of recentered theta*'s that are above the
absolute value of your observed value. This is your p-value.

So this seems simple. But I can't find people discussing this. So I'm
thinking I'm wrong. Could someone explain where I've gone wrong?

There's nothing particularly wrong about this line of reasoning, or atleast not (much) worse than the calculation of CI. After all, onedefinition of a CI at level 1-alpha is that it contains values of theta0for which the hypothesis theta=theta0 is accepted at level alpha. (Notthe only possible definition, though.)

The crucial bit in both cases is the assumption of approximatetranslation invariance, which holds asymptotically, but maybe not wellenough in small samples.

There are some braintwisters connected with the bootstrap; e.g., if thebootstrap distribution is skewed to the right, should the CI be skewedto the right or to the left? The answer is that it cannot be decidedbased on the distribution of theta* alone since that depends only on thetrue theta, and we need to know what the distribution would have beenhad a different theta been the true one.

The point is that these things get tricky, so most people head for thesafe haven of permutation testing, where it is rather more easy to feelthat you know what you are doing.

For a rather different approach, you might want to look into the theoryof empirical likelihood (book by Art Owen, or just Google it).


--
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalga...@biostat.ku.dk)              FAX: (+45) 35327907

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] p-values from bootstrap - what am I not understanding?

Reply via email to