Try this: do.call("rbind", by(d, d[1:2], function(x) with(x, data.frame(x[1, 1:2], `mean c` = mean(c), `sum d` = sum(d), `has X` = "X" %in% e, check.names = FALSE))))
or this (which uses 1 or 0 to mean TRUE or FALSE in the last column): > library(sqldf) # see http://sqldf.googlecode.com > sqldf("select a, b, avg(c) 'mean c', sum(d) 'sum d', sum(e = 'X')>0 'has X' > from d group by a, b", method = "raw") a b mean c sum d has X 1 a 1 0.3333333 2 1 2 a 2 0.2500000 2 1 3 a 3 1.4000000 4 1 4 b 1 0.0000000 0 0 5 b 2 0.6666667 1 1 6 b 3 0.7500000 2 1 or this: do.call("rbind", by(d, d[1:2], function(x) with(x, data.frame(x[1:2], `mean c` = mean(c), `sum d` = sum(d), `has X` = X %in% e)) On Wed, May 5, 2010 at 5:32 PM, utkarshsinghal <utkarsh.sing...@global-analytics.com> wrote: > Extending my question further, I want to apply different FUN arguments on > three fields and the "by" argument also contains more than one field. > For example: > set.seed(100) > d = > data.frame(a=sample(letters[1:2],20,replace=T),b=sample(3,20,replace=T),c=rpois(20,1),d=rbinom(20,1,0.5),e=rep(c("X","Y"),10)) > > Now I want to split by fields "a" and "b", and want to calculate mean(c), > sum(d) and "X"%in%e. > > Is there any function which can do this and return the output in a dataframe > format. For the above example, it should ideally be a 6*5 dataframe. > > Thanks in advance. > > Regards, > Utkarsh Singhal > > > > On 11/23/2009 5:14 AM, Gabor Grothendieck wrote: >> >> Try this: >> >> >>> >>> library(doBy) >>> summaryBy(breaks ~ ., warpbreaks, FUN = c(mean, sum, length)) >>> >> >> wool tension breaks.mean breaks.sum breaks.length >> 1 A L 44.55556 401 9 >> 2 A M 24.00000 216 9 >> 3 A H 24.55556 221 9 >> 4 B L 28.22222 254 9 >> 5 B M 28.77778 259 9 >> 6 B H 18.77778 169 9 >> >> On Mon, Nov 23, 2009 at 3:15 AM, utkarshsinghal >> <utkarsh.sing...@global-analytics.com> wrote: >> >>> >>> Hi All, >>> >>> I am currently doing the following to compute summary statistics of >>> aggregated data: >>> a = aggregate(warpbreaks$breaks, warpbreaks[,-1], mean) >>> b = aggregate(warpbreaks$breaks, warpbreaks[,-1], sum) >>> c = aggregate(warpbreaks$breaks, warpbreaks[,-1], length) >>> ans = cbind(a, b[,3], c[,3]) >>> >>> This seems unnecessarily complex to me so I tried >>> >>>> >>>> aggregate(warpbreaks$breaks, warpbreaks[,-1], function(z) >>>> c(mean(z),sum(z),length(z))) >>>> >>> >>> but aggregate doesn't allow FUN argument to return a vector. >>> >>> I tried "by", "tapply" and several other functions as well but the output >>> needed further modifications to get the same format as "ans" above. >>> >>> Is there any other function same as aggregate which allow FUN argument to >>> return vector. >>> >>> Regards >>> Utkarsh >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >> >> > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.