OK, I was not sure from your description, if there had been a large number of 
small clusters then my suggestion would have worked, but it looks like now that 
there would be too much cluster overlapping.

I know that there are bootstrap methods used on time series data that sample 
blocks of data to preserve at least some of the correlation, you could try 
those techniques (I have read about them, but never used them myself, so the 
most help that I can be is pointing you in that direction, I think it is 
described in Efron's original bootstrap book, but probably in other places as 
well).

The random number generators in R are based on good theory, I doubt that there 
would be any problems with using the sample function for randomization tests.

From: Wenjin Mao [mailto:wenj....@gmail.com]
Sent: Tuesday, May 24, 2011 6:54 PM
To: Greg Snow
Cc: Meyners, Michael; r-help@r-project.org
Subject: Re: [R] help on permutation/randomization test

Thanks, Greg.

I also considered the clusters. The difficulty is those objects not only enter 
the system at different time, but may have different duration in the system. 
Once they have a time overlap in the system, impacts may exist. If splitting  
into two clusters by setting a time threshold t, I need to drop all objects 
that enter before time t and leave after time t. The more clusters, the more 
objects to be dropped that I don't prefer. But two or three clusters may be too 
small as a sample size. My purpose is to test the difference between two 
systems.

Back to the R function question. When sample size is large, the full 
permutation test is infeasible and we have to use randomization test by 
selecting permutation order randomly. One factor I know that impacts the 
randomness is the random number generator. I am not sure how well the function 
"sample" is in randomness.

Thanks,
Wenjin

On Tue, May 24, 2011 at 4:45 PM, Greg Snow 
<greg.s...@imail.org<mailto:greg.s...@imail.org>> wrote:
If the x's that don't enter at the same time can be considered independent of 
each other, and only clusters that enter at the same time are dependent, then 
you can still do a permutation test by creating clusters with dependent values 
within each cluster, but independent between clusters, then permute the 
clusters rather than the individual data points.  This maintains the dependency.

I don't know of any existing functions that will do the whole thing for you, 
but this would only be a few lines of R code to do this type of permutation 
test.  The split function can help with separating the clusters, sample can do 
the permutations, and unlist or sapply can be used in calculating the statistic 
of interest.

-----Original Message-----
From: r-help-boun...@r-project.org<mailto:r-help-boun...@r-project.org> 
[mailto:r-help-boun...@r-project.org<mailto:r-help-boun...@r-project.org>] On 
Behalf Of Wenjin Mao
Sent: Tuesday, May 24, 2011 11:22 AM
To: Meyners, Michael
Cc: r-help@r-project.org<mailto:r-help@r-project.org>
Subject: Re: [R] help on permutation/randomization test

Thank you, Michael.

I don't think those data for the same group can be treated as repeated
measurements. Let's say I have 1000 observations from group 1 and 1500 obs
from group 2. Some of the 1000 objects of group 1 entered the system at the
same time and may effect each other; same for the other group. It's hard to
measure the heaviness of the dependency.

Even after some twist or transformation, the correlation can be reduced, the
R function "permtest" cannot handle such high sample size. Is there any
other R function I can use?

Thanks,
Wenjin

On Tue, May 24, 2011 at 1:37 AM, Meyners, Michael 
<meyner...@pg.com<mailto:meyner...@pg.com>> wrote:

> I suspect you need to give more information/background on the data (though
> this is not primarily an R-related question; you might want to try other
> resources instead). Unless I'm missing something here, I cannot think of ANY
> reasonable test: A permutation (using permtest or anything else) would
> destroy the correlation structure and hence give invalid results, and the
> assumptions of parametric tests are violated as well. Basically, you only
> have two observations, one for each group; with some good will you might
> consider these as repeated measurements, but still on the same subject or
> whatsoever. Hence no way to discriminate the subject from a treatment
> effect. There is not enough data to permute or to rely a statistical test
> on. So unless you can get rid of the dependency within groups (or at least
> reasonably assume observations to be independent), I'm not very
> optimistic...
> HTH, Michael
>
> > -----Original Message-----
> > From: r-help-boun...@r-project.org<mailto:r-help-boun...@r-project.org> 
> > [mailto:r-help-bounces@r-<mailto:r-help-bounces@r->
> > project.org<http://project.org>] On Behalf Of Wenjin Mao
> > Sent: Monday, May 23, 2011 20:56
> > To: r-help@r-project.org<mailto:r-help@r-project.org>
> > Subject: [R] help on permutation/randomization test
> >
> > Hi,
> >
> > I have two groups of data of different size:
> >    group A: x1, x2, ...., x_n;
> >    group B: y1, y2, ...., y_m; (m is not equal to n)
> >
> > The two groups are independent but observations within each group are
> > not independent,
> >  i.e., x1, x2, ..., x_n are not independent; but x's are independent
> > from y's
> >
> > I wonder if randomization test is still applicable to this case. Does
> > R have any function that can do this test for large m and n? I notice
> > that "permtest" can only handle small (m+n<22) samples.
> >
> > Thank you very much,
> > Wenjin
> >
> > ______________________________________________
> > R-help@r-project.org<mailto:R-help@r-project.org> mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
       [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org<mailto:R-help@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to