On 10/12/07, Ben Bolker <[EMAIL PROTECTED]> wrote: > > Trying to find a quick/slick/easily interpretable way to > collapse a data set. > > Suppose I have a data set that looks like this: > > h <- structure(list(INDEX = structure(1:6, .Label = c("1", "2", "3", > "4", "5", "6"), class = "factor"), TICKS = c(0, 0, 0, 0, 0, 3 > ), BROOD = structure(c(1L, 1L, 2L, 3L, 3L, 3L), .Label = c("501", > "502", "503"), class = "factor"), HEIGHT = c(465, 465, 472, 475, > 475, 475), YEAR = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("95", > "96", "97"), class = "factor"), LOCATION = structure(c(1L, 1L, > 2L, 3L, 3L, 3L), .Label = c("32", "36", "37"), class = "factor")), .Names = > c("INDEX", > "TICKS", "BROOD", "HEIGHT", "YEAR", "LOCATION"), row.names = c(NA, > 6L), class = "data.frame") > > i.e., > > h > INDEX TICKS BROOD HEIGHT YEAR LOCATION > 1 1 0 501 465 95 32 > 2 2 0 501 465 95 32 > 3 3 0 502 472 95 36 > 4 4 0 503 475 95 37 > 5 5 0 503 475 95 37 > 6 6 3 503 475 95 37 > > I want a data set that looks like this: > BROOD TICKS.mean HEIGHT YEAR LOCATION > 501 0 465 95 32 > 502 0 472 95 36 > 503 1 475 95 37 > > (for example). I.e., I want to collapse it to a dataset by brood, > taking the mean of TICKS and reducing each of > the other variables (would be nice to allow multiple summary > statistics, e.g. TICKS.mean and TICKS.sd ...) > In some ways, this is the opposite of a database join/merge > operation -- I want to collapse the data frame back down. > If I had the "unmerged" (i.e., the brood table) handy I could > use it. > > I know I can construct this table a bit at a time, > using tapply() or by() or aggregate() to get the means. > > Here's a solution that takes the first element of each factor > and the mean of each numeric variable. I can imagine there > are more general/flexible solutions. (One might want to > specify more than one summary function, or specify that > factors that vary within group should be dropped.) > > vtype = sapply(h,class) ## variable types [numeric or factor] > vtypes = unique(vtype) ## possible types > v2 = lapply(vtypes,function(z) which(vtype==z)) ## which are which? > cfuns = list(factor=function(z)z[1],numeric=mean)## functions to apply > m = mapply(function(w,f) { aggregate(h[w],list(h$BROOD),f) }, > v2,cfuns,SIMPLIFY=FALSE) > data.frame(m[[1]],m[[2]][-1]) > > My question is whether this is re-inventing the wheel. Is there > some function or package that performs this task?
Maybe the reshape package? http://had.co.nz/reshape hm <- melt(h, m = "TICKS") cast(hm, BROOD + HEIGHT + YEAR + LOCATION ~ ., mean) cast(hm, BROOD + HEIGHT + LOCATION ~ YEAR, mean) cast(hm, BROOD ~ HEIGHT ~ YEAR, mean) You should be able to create just about any data structure you need, and if you can't let me know. Hadley -- http://had.co.nz/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.