Re: [R] Permutation or Bootstrap to obtain p-value for one sample

Bert Gunter Sun, 09 Oct 2011 07:32:08 -0700

I may be speaking out of turn here, but I would prefer not to see R-help
turn into a tutorial site for basic statistics.Such sites already exist
(e.g. http://stats.stackexchange.com/).


I realize that there is occasionally reason to venture down this path a way
within legitimate R contexts, but this seems to me to have gone way beyond.

Just my opinion, of course.

-- Bert

On Sun, Oct 9, 2011 at 3:00 AM, francesca casalino <
francy.casal...@gmail.com> wrote:

> Thank you very much to both Ken and Peter for the very helpful
> explanations.
>
> Just to understand this better (sorry for repeating but I am also new in
> statisticsso please correct me where I am wrong):
>
> Ken' method:
> Random sampling of the mean, and then using these means to construct a
> distribution of means (the 'null' distribution), and I can then use this
> normal distribution and compare the population mean to my mean using, for
> example, z-score.
> Of note: The initial distributions are not normal, so I thought I needed to
> base my calculations on the median, but I can use the mean to construct a
> normal distribution.
> This would be defined a bootstrap test.
>
> Peter's method:
> Random sampling of the mean, and then comparing each sampled mean with the
> population mean and see if it is higher or equal to the difference between
> my sample and the population mean. This is a permutation test, but to
> actually get CI and a p-value I would need bootstrap method.
>
> Did I understand this correctly?
>
> I tried to start with Ken's approach for now, and followed his steps, but:
>
> 1) I get a lot of NaN in the sampling distribution, is this normal?
> 2) I think I am doing again something wrong when I try to find a
> p-valueThis is what I did:
>
> nreps=10000
> mean.dist=rep(NA,nreps)
>
> for(replication in 1:nreps)
> {
> my.sample=sample(population$Y, 250, replace=F)
> #Peter mentioned that this sampling should be without replacement, so I
> went
> for that---
>
> mean.for.rep=mean(my.sample) #mean for this replication
> mean.dist[replication]=mean.for.rep #store the mean
> }
>
> hist(mean.dist,main="Null Dist of Means", col="chartreuse")
>  #Show the means in a nifty color
>
> mean_dist= mean(mean.dist, na.rm=TRUE)
> sd_pop= sd(mean.dist, na.rm=TRUE)
>
> mean_sample= mean(population$Y, na.rm=TRUE)
>
> z_stat= (mean_sample - mean_dist)/(sd_pop/sqrt(2089))
> p_value= 2*pnorm(-abs(z_stat))
>
> Is this correct?
> THANK YOU SO MUCH FOR ALL YOUR HELP!!
>
> 2011/10/9 Ken Hutchison <vicvoncas...@gmail.com>
>
> > Hi Francy,
> >   A bootstrap test would likely be sufficient for this problem, but a
> > one-sample t-test isn't advisable or necessary in my opinion. If you use
> a
> > t-test multiple times you are making assumptions about the distribution
> of
> > your data; more importantly, your probability of Type 1 error will be
> > increased with each test. So, a valid thing to do would be to sample
> > (computation for this problem won't be expensive so do alotta reps) and
> > compare your mean to the null distribution of means. I.E.
> >
> > nreps=10000
> > mean.dist=rep(NA,nreps)
> >
> > for(replication in 1:nreps)
> > {
> > my.sample=sample(population, 2500, replace=T)
> > #replace could be false, depends on preference
> > mean.for.rep=mean(my.sample) #mean for this replication
> > mean.dist[replication]=mean.for.rep #store the mean
> > }
> >
> > hist(mean.dist,main="Null Dist of Means", col="chartreuse")
> >  #Show the means in a nifty color
> >
> > You can then perform various tests given the null distribution, or infer
> > from where your sample mean lies within the distribution or something to
> > that effect. Also, if the distribution is normal, which is somewhat
> likely
> > since it is a distribution of means: (shapiro.test or require(nortest)
> > ad.test will let you know) you should be able to make inference from that
> > using parametric methods (once) which will fit the truth a bit better
> than a
> > t.test.
> >         Hope that's helpful,
> >            Ken Hutchison
> >
> >
> > On Sat, Oct 8, 2011 at 10:04 AM, francy <francy.casal...@gmail.com>
> wrote:
> >
> >> Hi,
> >>
> >> I am having trouble understanding how to approach a simulation:
> >>
> >> I have a sample of n=250 from a population of N=2,000 individuals, and I
> >> would like to use either permutation test or bootstrap to test whether
> >> this
> >> particular sample is significantly different from the values of any
> other
> >> random samples of the same population. I thought I needed to take random
> >> samples (but I am not sure how many simulations I need to do) of n=250
> >> from
> >> the N=2,000 population and maybe do a one-sample t-test to compare the
> >> mean
> >> score of all the simulated samples, + the one sample I am trying to
> prove
> >> that is different from any others, to the mean value of the population.
> >> But
> >> I don't know:
> >> (1) whether this one-sample t-test would be the right way to do it, and
> >> how
> >> to go about doing this in R
> >> (2) whether a permutation test or bootstrap methods are more appropriate
> >>
> >> This is the data frame that I have, which is to be sampled:
> >> df<-
> >> i.e.
> >> x y
> >> 1 2
> >> 3 4
> >> 5 6
> >> 7 8
> >> . .
> >> . .
> >> . .
> >> 2,000
> >>
> >> I have this sample from df, and would like to test whether it is has
> >> extreme
> >> values of y.
> >> sample1<-
> >> i.e.
> >> x y
> >> 3 4
> >> 7 8
> >> . .
> >> . .
> >> . .
> >> 250
> >>
> >> For now I only have this:
> >>
> >> R=999 #Number of simulations, but I don't know how many...
> >> t.values =numeric(R)     #creates a numeric vector with 999 elements,
> >> which
> >> will hold the results of each simulation.
> >> for (i in 1:R) {
> >> sample1 <- df[sample(nrow(df), 250, replace=TRUE),]
> >>
> >> But I don't know how to continue the loop: do I calculate the mean for
> >> each
> >> simulation and compare it to the population mean?
> >> Any help you could give me would be very appreciated,
> >> Thank you.
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://r.789695.n4.nabble.com/Permutation-or-Bootstrap-to-obtain-p-value-for-one-sample-tp3885118p3885118.html
> >> Sent from the R help mailing list archive at Nabble.com.
> >>
> >> ______________________________________________
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> >
>
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Permutation or Bootstrap to obtain p-value for one sample

Reply via email to