Re: [Rd] by() processing on a dataframe

Duncan Murdoch Fri, 30 Sep 2005 11:41:35 -0700

On 9/30/2005 1:41 PM, hadley wickham wrote:
> I'm not entirely sure what you want, but maybe this does the trick?
> 
> data.frame.by <- function(data, variables, fun, ...) {
>       if (length(variables) == 0 ) {
>               df <- data.frame(results = 0)
>               df$results <- list(fun(data$value, ...))
>               return(df)
>       }
> 
>       sorted <- sort.df(data, variables)[,c(variables), drop=FALSE]
>       duplicates <- duplicated(sorted[,variables, drop=FALSE])
>       index <- cumsum(!duplicates)
> 
>       results <- by(data, index, fun, ...)
> 
>       cols <- sorted[!duplicates,variables, drop=FALSE]
>       cols$results <- array(results)
>       cols
> }
> 
> 
> sort.df <- function(data, vars) {
>       data[do.call("order", data[,vars, drop=FALSE]), ,drop=FALSE]
> }
> 
> 
> dataset <- data.frame(gp1 = rep(1:2, c(4,4)), gp2 = rep(1:4,
> c(2,2,2,2)), value = rnorm(8))
> 
> data.frame.by(dataset, c("gp1", "gp2"), function(data) mean(data$value))
> data.frame.by(dataset, "gp1", function(data) tapply(data$value, data$gp2, 
> mean))
> data.frame.by(dataset, "gp1", function(data) lm(gp2 ~ value, data)) #
> doesn't print, but everything is there ok
> 
> (note that the results column will be a list if necessary - this may
> be a serious abuse of data frames, but I'm not sure and no one replied
> when I queried the list)


I think this should work.  Thanks!

Duncan Murdoch

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] by() processing on a dataframe

Reply via email to