On Wed, 3 Dec 2008, axionator wrote:

Hi all,
I have a data frame with "clustered" rows as follows:
Cu1  x1 y1 z1 ...
Cu1  x2 y2 z2 ...
Cu1  x3 y3 z3 ... # end of first cluster Cu1
Cu2  x4 y4 z4 ...
Cu2  x5 y5 z5
Cu2  ...               # end of second cluster Cu2
Cu3 ...
...
"cluster"-size is 3 in the example above (rows making up a cluster are
always consecutive). Is there any faster way to sample n clusters
(with replacement) from this dataframe and build up a new data frame
out of these sampled clusters? I use the "sample" function and a
for-loop.

Something like this:

cl.samps <- sample( split( df, df$cluster ), n.samps, repl=TRUE )

do.call( rbind, cl.samps )

If you need to identify the samples from which the rows came (versus just the originating clusters):

cl.samps2 <- lapply( seq(along=cl.samps),
        function(x) cbind( cl.samps[[ x ]], new.cluster = x ) )

do.call( rbind, cl.samps2 )

HTH,

Chuck


Thanks in advance
Armin

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Charles C. Berry                            (858) 534-2098
                                            Dept of Family/Preventive Medicine
E mailto:[EMAIL PROTECTED]                  UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to