I may be speaking out of turn here, but I would prefer not to see R-help turn into a tutorial site for basic statistics.Such sites already exist (e.g. http://stats.stackexchange.com/).
I realize that there is occasionally reason to venture down this path a way within legitimate R contexts, but this seems to me to have gone way beyond. Just my opinion, of course. -- Bert On Sun, Oct 9, 2011 at 3:00 AM, francesca casalino < francy.casal...@gmail.com> wrote: > Thank you very much to both Ken and Peter for the very helpful > explanations. > > Just to understand this better (sorry for repeating but I am also new in > statistics so please correct me where I am wrong): > > Ken' method: > Random sampling of the mean, and then using these means to construct a > distribution of means (the 'null' distribution), and I can then use this > normal distribution and compare the population mean to my mean using, for > example, z-score. > Of note: The initial distributions are not normal, so I thought I needed to > base my calculations on the median, but I can use the mean to construct a > normal distribution. > This would be defined a bootstrap test. > > Peter's method: > Random sampling of the mean, and then comparing each sampled mean with the > population mean and see if it is higher or equal to the difference between > my sample and the population mean. This is a permutation test, but to > actually get CI and a p-value I would need bootstrap method. > > Did I understand this correctly? > > I tried to start with Ken's approach for now, and followed his steps, but: > > 1) I get a lot of NaN in the sampling distribution, is this normal? > 2) I think I am doing again something wrong when I try to find a > p-value This is what I did: > > nreps=10000 > mean.dist=rep(NA,nreps) > > for(replication in 1:nreps) > { > my.sample=sample(population$Y, 250, replace=F) > #Peter mentioned that this sampling should be without replacement, so I > went > for that--- > > mean.for.rep=mean(my.sample) #mean for this replication > mean.dist[replication]=mean.for.rep #store the mean > } > > hist(mean.dist,main="Null Dist of Means", col="chartreuse") > #Show the means in a nifty color > > mean_dist= mean(mean.dist, na.rm=TRUE) > sd_pop= sd(mean.dist, na.rm=TRUE) > > mean_sample= mean(population$Y, na.rm=TRUE) > > z_stat= (mean_sample - mean_dist)/(sd_pop/sqrt(2089)) > p_value= 2*pnorm(-abs(z_stat)) > > Is this correct? > THANK YOU SO MUCH FOR ALL YOUR HELP!! > > 2011/10/9 Ken Hutchison <vicvoncas...@gmail.com> > > > Hi Francy, > > A bootstrap test would likely be sufficient for this problem, but a > > one-sample t-test isn't advisable or necessary in my opinion. If you use > a > > t-test multiple times you are making assumptions about the distribution > of > > your data; more importantly, your probability of Type 1 error will be > > increased with each test. So, a valid thing to do would be to sample > > (computation for this problem won't be expensive so do alotta reps) and > > compare your mean to the null distribution of means. I.E. > > > > nreps=10000 > > mean.dist=rep(NA,nreps) > > > > for(replication in 1:nreps) > > { > > my.sample=sample(population, 2500, replace=T) > > #replace could be false, depends on preference > > mean.for.rep=mean(my.sample) #mean for this replication > > mean.dist[replication]=mean.for.rep #store the mean > > } > > > > hist(mean.dist,main="Null Dist of Means", col="chartreuse") > > #Show the means in a nifty color > > > > You can then perform various tests given the null distribution, or infer > > from where your sample mean lies within the distribution or something to > > that effect. Also, if the distribution is normal, which is somewhat > likely > > since it is a distribution of means: (shapiro.test or require(nortest) > > ad.test will let you know) you should be able to make inference from that > > using parametric methods (once) which will fit the truth a bit better > than a > > t.test. > > Hope that's helpful, > > Ken Hutchison > > > > > > On Sat, Oct 8, 2011 at 10:04 AM, francy <francy.casal...@gmail.com> > wrote: > > > >> Hi, > >> > >> I am having trouble understanding how to approach a simulation: > >> > >> I have a sample of n=250 from a population of N=2,000 individuals, and I > >> would like to use either permutation test or bootstrap to test whether > >> this > >> particular sample is significantly different from the values of any > other > >> random samples of the same population. I thought I needed to take random > >> samples (but I am not sure how many simulations I need to do) of n=250 > >> from > >> the N=2,000 population and maybe do a one-sample t-test to compare the > >> mean > >> score of all the simulated samples, + the one sample I am trying to > prove > >> that is different from any others, to the mean value of the population. > >> But > >> I don't know: > >> (1) whether this one-sample t-test would be the right way to do it, and > >> how > >> to go about doing this in R > >> (2) whether a permutation test or bootstrap methods are more appropriate > >> > >> This is the data frame that I have, which is to be sampled: > >> df<- > >> i.e. > >> x y > >> 1 2 > >> 3 4 > >> 5 6 > >> 7 8 > >> . . > >> . . > >> . . > >> 2,000 > >> > >> I have this sample from df, and would like to test whether it is has > >> extreme > >> values of y. > >> sample1<- > >> i.e. > >> x y > >> 3 4 > >> 7 8 > >> . . > >> . . > >> . . > >> 250 > >> > >> For now I only have this: > >> > >> R=999 #Number of simulations, but I don't know how many... > >> t.values =numeric(R) #creates a numeric vector with 999 elements, > >> which > >> will hold the results of each simulation. > >> for (i in 1:R) { > >> sample1 <- df[sample(nrow(df), 250, replace=TRUE),] > >> > >> But I don't know how to continue the loop: do I calculate the mean for > >> each > >> simulation and compare it to the population mean? > >> Any help you could give me would be very appreciated, > >> Thank you. > >> > >> > >> -- > >> View this message in context: > >> > http://r.789695.n4.nabble.com/Permutation-or-Bootstrap-to-obtain-p-value-for-one-sample-tp3885118p3885118.html > >> Sent from the R help mailing list archive at Nabble.com. > >> > >> ______________________________________________ > >> R-help@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > >> > > > > > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.