Hi: Try this:
test.set<-data.frame(site=1:10,x=.Random.seed[1:100],y=rnorm(100)) str(test.set) 'data.frame': 100 obs. of 3 variables: $ site: int 1 2 3 4 5 6 7 8 9 10 ... $ x : int 403 10 -74327032 10380982 -951011855 1368411171 -390937486 -1081698620 -812257145 -1354214307 ... $ y : num -0.414 -0.851 -1.67 -0.315 1.934 ... # It's easier to use numcolwise() if the grouping variables are not numeric, # so change site to be a factor variable: > test.set$site <- factor(test.set$site) > ddply(test.set, .(site), numcolwise(mean)) site x y 1 1 -207083133 -0.01895802 2 2 321488067 0.19581351 3 3 46121295 -0.41734140 4 4 321915795 -0.08254519 5 5 -416497845 -0.10543154 6 6 -27745056 -0.38855565 7 7 515863199 -0.54731714 8 8 412917654 0.05438913 9 9 -327132515 0.26896930 10 10 74689545 -0.45381880 > ddply(test.set, .(site), numcolwise(max)) site x y 1 1 1997725565 0.8473888 2 2 2018830674 1.6600380 3 3 1909893732 2.4445523 4 4 1365543339 1.3697428 5 5 1688291226 2.2145275 6 6 1368411171 1.5141589 7 7 1974894876 1.2868469 8 8 2054615743 0.7917823 9 9 1091060578 2.4678820 10 10 2055409475 2.4488190 > ddply(test.set, .(site), numcolwise(min)) <snipped - same idea> I imagine you'd want to put all this together, so an easier way in ddply() is to create a function that reads a data frame and outputs a data frame, as follows: f <- function(d) data.frame(mean.x = mean(d$x), mean.y = mean(d$y), min.x = min(d$x), min.y = min(d$y), max.x = max(d$x), max.y = max(d$y)) ddply(test.set, .(site), f) In this case, aggregate() would be a little bit simpler (R-2.11.0 +): aggregate(cbind(x, y) ~ site, data = test.set, FUN = function(x) c(mean = mean(x), min = min(x), max = max(x))) On Wed, May 11, 2011 at 9:46 AM, Justin <jto...@gmail.com> wrote: > I'm trying to use ddply to compute summary statistics for many variables > splitting on the variable site. however, it seems to work fine for mean() but > if i use max() or min() things fall apart. whats going on? > The problem in your code is that you don't specify to what the mean/min/max is supposed to refer. HTH, Dennis > test.set<-data.frame(site=1:10,x=.Random.seed[1:100],y=rnorm(100)) > means<-ddply(test.set,.(site),mean) > means > site x y > 1 1 -97459496 -0.14826303 > 2 2 -150246922 -0.29279556 > 3 3 471813178 0.13090210 > 4 4 -655451621 0.07908207 > 5 5 -229505843 0.10239588 > 6 6 -667025397 -0.34930275 > 7 7 510041943 0.20547460 > 8 8 270993292 -0.63658199 > 9 9 264989314 0.09695455 > 10 10 -199965142 -0.07202699 > maxes<-ddply(test.set,.(site),max) > maxes > site V1 > 1 1 1942437227 > 2 2 2066224792 > 3 3 2146619846 > 4 4 1381954134 > 5 5 1802867123 > 6 6 1786627153 > 7 7 1951106534 > 8 8 1498358582 > 9 9 2022046126 > 10 10 1670904926 > > Can you all shed some light on this? I'm stumped! > > Thanks, > Justin > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.