Re: [R] different interface to by (tapply)?

ivo welch Mon, 30 Aug 2010 15:44:03 -0700

mercy!!! ;-)


thanks, everyone.  sure beats me trying to reinvent a slower version of the
wheel.  came in very handy.

I think it would be nice to see some of these pointers in the "?by" manual
page.  not sure who to ask to do this, but maybe this person reads r-help.

/iaw

----
Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com)




On Mon, Aug 30, 2010 at 5:23 PM, Gabor Grothendieck <ggrothendi...@gmail.com
> wrote:

> On Mon, Aug 30, 2010 at 3:54 PM, Dennis Murphy <djmu...@gmail.com> wrote:
> > Hi:
> >
> > You've already gotten some good replies re aggregate() and plyr; here are
> > two more choices, from packages doBy and data.table, plus the others for
> > a contained summary:
> >
> >  key <- c(1,1,1,2,2,2)
> >  val1 <- rnorm(6)
> >  indf <- data.frame( key, val1)
> >  outdf <- by(indf, indf$key, function(x) c(m=mean(x), s=sd(x)) )
> >  outdf
> >
> > # Alternatives:
> >
> > # aggregate (base) with new formula interface
> >
> > # write a small function to return multiple outputs
> > f <- function(x) c(mean = mean(x, na.rm = TRUE), sd = sd(x, na.rm =
> TRUE))
> >
> > aggregate(val1 ~ key, data = indf, FUN = f)
> >  key  val1.mean    val1.sd
> > 1   1 -0.9783589  0.6378922
> > 2   2  0.2816016  1.4490699
> >
> > # package doBy   (get the same output)
> >
> > library(doBy)
> > summaryBy(val1 ~ key, data = indf, FUN = f)
> >  key  val1.mean   val1.sd
> > 1   1 -0.9783589 0.6378922
> > 2   2  0.2816016 1.4490699
> >
> > # package plyr
> >
> > library(plyr)
> > ddply(indf, .(key), summarise, mean = mean(val1), sd = sd(val1))
> >  key       mean        sd
> > 1   1 -0.9783589 0.6378922
> > 2   2  0.2816016 1.4490699
> >
> > # package data.table
> >
> > library(data.table)
> > indt <- data.table(indf)
> > indt[, list(mean = mean(val1), sd = sd(val1)), by =
> list(as.integer(key))]
> >     key       mean        sd
> > [1,]   1 -0.9783589 0.6378922
> > [2,]   2  0.2816016 1.4490699
> >
> > It's a cornucopia! :) Multiple grouping variables are no problem with
> these
> > functions, BTW.
> >
> > HTH,
>
>
> And here are yet four more:
>
> >
> > f.by <- function(x) c(key = x$key[1], mean = mean(x$val), sd =
> sd(x$val))
> > do.call(rbind, by(indf, indf["key"], f.by))
>  key        mean        sd
> 1   1 0.006794852 0.3779713
> 2   2 0.251890650 0.4379315
> >
> > library(sqldf)
> > sqldf("select key, avg(val1) mean, stdev(val1) sd from indf group by
> key")
>  key        mean        sd
> 1   1 0.006794852 0.3779713
> 2   2 0.251890650 0.4379315
> >
> > library(remix)
> > remix(val1 ~ key, transform(indf, key = factor(key)), funs = c(mean, sd))
> val1 ~ key
> ==========
>
> +-----+---+------+-------+------+
> |                | mean  | sd   |
> +=====+===+======+=======+======+
> | key | 1 | val1 | 0.01  | 0.38 |
> +     +---+------+-------+------+
> |     | 2 | val1 | 0.25  | 0.44 |
> +-----+---+------+-------+------+
> >
> > library(Hmisc)
> > summary(val1 ~ key, indf, fun = function(x) c(mean = mean(x), sd =
> sd(x)))
> val1    N=6
>
> +-------+-+-+-----------+---------+
> |       | |N|mean       |sd.val1  |
> +-------+-+-+-----------+---------+
> |key    |1|3|0.006794852|0.3779713|
> |       |2|3|0.251890650|0.4379315|
> +-------+-+-+-----------+---------+
> |Overall| |6|0.129342751|0.3897180|
> +-------+-+-+-----------+---------+
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] different interface to by (tapply)?

Reply via email to