Gene - Let me try to address your concerns one at a time:
Since the formula interface to aggregate was introduced pretty recently (I think R-2.11.1, but I might be wrong)
so when you try to use it in an R-2.10.1 it won't work. Now let's take a close look at the help page for aggregate. The default method, which will be called if you pass a vector to aggregate, or the data frame method are described like this: aggregate(x, ...) ## S3 method for class 'data.frame' aggregate(x, by, FUN, ..., simplify = TRUE) So if you pass an na.action= argument to aggregate when the first argument is a vector or data frame, it gets picked up by the ... argument and gets passed to your function, so you might see messages like this:
sum(1:10,na.action=na.omit)
Error in sum(1:10, na.action = na.omit) : invalid 'type' (closure) of argument
sum(1:10,na.action='na.omit')
Error in sum(1:10, na.action = "na.omit") : invalid 'type' (character) of argument (It's sum complaining, not aggregate.) As far as na.action goes, when you're using the aggregate formula method, it will remove all rows from the specified data frame that have any missing values. If you pass that to a function with the na.rm=TRUE argument, that function will remove the missing values as it should. So the only time you'll see the effect of na.action=na.pass is when you call a function that won't remove the missing values. (The subtle distinction between na.action=na.omit and na.rm=TRUE is the function you're calling is that na.omit will remove the entire row of data when it encounters a missing value, while the na.rm=TRUE argument will remove missing values separately from each variable.) Hope this helps. - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spec...@stat.berkeley.edu On Fri, 4 Feb 2011, Gene Leynes wrote:
Can someone please tell me what is up with na.action in aggregate? My (somewhat) reproducible example: (I say somewhat because some lines wouldn't run in a separate session, more below) set.seed(100) dat=data.frame( x1=sample(c(NA,'m','f'), 100, replace=TRUE), x2=sample(c(NA, 1:10), 100, replace=TRUE), x3=sample(c(NA,letters[1:5]), 100, replace=TRUE), x4=sample(c(NA,T,F), 100, replace=TRUE), y=sample(c(rep(NA,5), rnorm(95)))) dat ## The total from dat: sum(dat$y, na.rm=T) ## The total from aggregate: sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x) sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y) ## <--- This line gave an error in a separate R instance ## The aggregate formula is excluding NA ## So, let's try to include NAs sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='na.pass')$y) sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action=na.pass)$y) ## The aggregate formula is STILL excluding NA ## In fact, the formula doesn't seem to notice the na.action sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='foo man chew')$y) ## Hmmmm... that error surprised me (since the previous two things ran) ## So, let's try to change the global options ## (not mentioned in the help, but after reading the help ## 100 times, I thought I would go above and beyond to avoid ## any r list flames from people complaining ## that I didn't read the help... but that's a separate topic) options(na.action ="na.pass") sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x) sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y) sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='na.pass')$y) sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action=na.pass)$y) ## (NAs are still omitted) ## Even more frustrating... ## Why don't any of these work??? sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action='na.pass')$x) sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.pass)$x) sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action='na.omit')$x) sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.omit)$x) ## This does work, but in my real data set, I want NA to really be NA for(j in 1:4) dat[is.na(dat[,j]),j] = 'NA' sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x) sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y) ## My first session info # #> sessionInfo() #R version 2.12.0 (2010-10-15) #Platform: i386-pc-mingw32/i386 (32-bit) # #locale: # [1] LC_COLLATE=English_United States.1252 #[2] LC_CTYPE=English_United States.1252 #[3] LC_MONETARY=English_United States.1252 #[4] LC_NUMERIC=C #[5] LC_TIME=English_United States.1252 # #attached base packages: # [1] stats graphics grDevices utils datasets methods base # #other attached packages: # [1] plyr_1.2.1 zoo_1.6-4 gdata_2.8.1 rj_0.5.0-5 # #loaded via a namespace (and not attached): # [1] grid_2.12.0 gtools_2.6.2 lattice_0.19-13 rJava_0.8-8 #[5] tools_2.12.0 I tried running that example in a different version of R, with and I got completely different results The other version of R wouldn't recognize the formula at all.. My other version of R: # My second session info #> sessionInfo() #R version 2.10.1 (2009-12-14) #i386-pc-mingw32 # #locale: # [1] LC_COLLATE=English_United States.1252 #[2] LC_CTYPE=English_United States.1252 #[3] LC_MONETARY=English_United States.1252 #[4] LC_NUMERIC=C #[5] LC_TIME=English_United States.1252 # #attached base packages: # [1] stats graphics grDevices utils datasets methods base #> # PS: Also, I have read the help on aggregate, factor, as.factor, and several other topics. If I missed something, please let me know. Some people like to reply to questions by telling the sender that R has documentation. Please don't. The R help archives are littered with reminders, friendly and otherwise, of R's documentation. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.