Tom, I don't think the question was stupid (silly depends on what you consider silly), but you are right that the distinction probably does not matter that much. If you go back to basic probability then remember that there are n! possible permutations (n is total number of observations). The number of combinations (no repeats of same groupings) is n!/(m1! * m2! * ... * mk!) (m1-mk are the numbers of observations in each group). To find the p-value for the exaustive permutation test you would compute how many of the permutations give as extreme or more extreeme of a statistic and divide that be the total number of permutations. For the combinations you would do the same type of division, but it turns out that the denominators of the 2 fractions are the same (the m1! * ... * mk!) and would cancel each other giving the exact same answer as the permutation test. So for the exaustive case you get the same exact answer, but for large n it is more practical to sample from the set of permutations or combinations (with replacement usually) and calculate the proportion that have an extreeme statistic, basic sampling theory tells us that this proportion will approximate the true p-value. If you are only doing a small number of permutations/combinations then the combinations would probably give a better estimate (so the original question is not completely silly). However with the speed of modern computers it is fairly easy and fast to take a large number of permutations so that this is not really an issue. It is probably faster to take a couple thousand extra permutations than to come up with a good algorithm for sampling the permutations. So my advice is don't worry about the "shadow samples", just use the sample function, but do a lot of the random permutations. Hope this helps,
________________________________ From: [EMAIL PROTECTED] on behalf of Tom Backer Johnsen Sent: Fri 1/11/2008 10:57 AM To: [EMAIL PROTECTED] Subject: Re: [R] Randomization tests, grouped data Tom Backer Johnsen wrote: > The other day I was looking into one of the classics in resampling, > Eugene Edgington's "Randomization Tests". This type of test is simple > to do in R with things like a simple correlation, the sample () > function is perfect for the purpose. > > However, things are more complex if you have grouped data, like a > one-way ANOVA. The reason is that you have to avoid the consideration > of what Edgington calls "mirror samples", shuffles that only move data > within the groups. After all, one only wants to consider changes > bewteen groups. In that case (I think) the sample () function is too > general. > > Are there something that can handle this in R? After a few hours thinking on and off about the problem, I suspect that the question may be stupid or silly (or both). If that is the case, I would very much like to know why. Again, Tom ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.