Gene -
   Let me try to address your concerns one at a time:

Since the formula interface to aggregate was introduced pretty recently (I think R-2.11.1, but I might be wrong)
so when you try to use it in an R-2.10.1 it won't work.

Now let's take a close look at the help page for aggregate.

The default method, which will be called if you pass a vector
to aggregate, or the data frame method are described like this:

     aggregate(x, ...)

     ## S3 method for class 'data.frame'
     aggregate(x, by, FUN, ..., simplify = TRUE)

So if you pass an na.action= argument to aggregate when the first argument
is a vector or data frame, it gets picked up by the ... argument and gets
passed to your function, so you might see messages like this:

sum(1:10,na.action=na.omit)
Error in sum(1:10, na.action = na.omit) :
  invalid 'type' (closure) of argument
sum(1:10,na.action='na.omit')
Error in sum(1:10, na.action = "na.omit") :
  invalid 'type' (character) of argument

(It's sum complaining, not aggregate.)

As far as na.action goes, when you're using the aggregate formula method,
it will remove all rows from the specified data frame that have any missing
values.  If you pass that to a function with the na.rm=TRUE argument, that
function will remove the missing values as it should.  So the only time you'll
see the effect of na.action=na.pass is when you call a function that won't
remove the missing values.   (The subtle distinction between na.action=na.omit
and na.rm=TRUE is the function you're calling is that na.omit will remove
the entire row of data when it encounters a missing value, while the na.rm=TRUE
argument will remove missing values separately from each variable.)

Hope this helps.
                                        - Phil Spector
                                         Statistical Computing Facility
                                         Department of Statistics
                                         UC Berkeley
                                         spec...@stat.berkeley.edu



On Fri, 4 Feb 2011, Gene Leynes wrote:

Can someone please tell me what is up with na.action in aggregate?

My (somewhat) reproducible example:
(I say somewhat because some lines wouldn't run in a separate session, more
below)

set.seed(100)
dat=data.frame(
       x1=sample(c(NA,'m','f'), 100, replace=TRUE),
       x2=sample(c(NA, 1:10), 100, replace=TRUE),
       x3=sample(c(NA,letters[1:5]), 100, replace=TRUE),
       x4=sample(c(NA,T,F), 100, replace=TRUE),
       y=sample(c(rep(NA,5), rnorm(95))))
dat
## The total from dat:
sum(dat$y, na.rm=T)
## The total from aggregate:
sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x)
sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y)  ## <--- This line
gave an error in a separate R instance
## The aggregate formula is excluding NA

## So, let's try to include NAs
sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='na.pass')$y)
sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action=na.pass)$y)
## The aggregate formula is STILL excluding NA
## In fact, the formula doesn't seem to notice the na.action
sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='foo man
chew')$y)
## Hmmmm... that error surprised me (since the previous two things ran)

## So, let's try to change the global options
## (not mentioned in the help, but after reading the help
##  100 times, I thought I would go above and beyond to avoid
##  any r list flames from people complaining
##  that I didn't read the help... but that's a separate topic)
options(na.action ="na.pass")
sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x)
sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y)
sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='na.pass')$y)
sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action=na.pass)$y)
## (NAs are still omitted)

## Even more frustrating...
## Why don't any of these work???
sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action='na.pass')$x)
sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.pass)$x)
sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action='na.omit')$x)
sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.omit)$x)


## This does work, but in my real data set, I want NA to really be NA
for(j in 1:4)
   dat[is.na(dat[,j]),j] = 'NA'
sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x)
sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y)


## My first session info
#
#> sessionInfo()
#R version 2.12.0 (2010-10-15)
#Platform: i386-pc-mingw32/i386 (32-bit)
#
#locale:
#        [1] LC_COLLATE=English_United States.1252
#[2] LC_CTYPE=English_United States.1252
#[3] LC_MONETARY=English_United States.1252
#[4] LC_NUMERIC=C
#[5] LC_TIME=English_United States.1252
#
#attached base packages:
#        [1] stats     graphics  grDevices utils     datasets  methods
base
#
#other attached packages:
#        [1] plyr_1.2.1  zoo_1.6-4   gdata_2.8.1 rj_0.5.0-5
#
#loaded via a namespace (and not attached):
#        [1] grid_2.12.0     gtools_2.6.2    lattice_0.19-13 rJava_0.8-8
#[5] tools_2.12.0



I tried running that example in a different version of R, with and I got
completely different results

The other version of R wouldn't recognize the formula at all..

My other version of R:

#  My second session info
#> sessionInfo()
#R version 2.10.1 (2009-12-14)
#i386-pc-mingw32
#
#locale:
#        [1] LC_COLLATE=English_United States.1252
#[2] LC_CTYPE=English_United States.1252
#[3] LC_MONETARY=English_United States.1252
#[4] LC_NUMERIC=C
#[5] LC_TIME=English_United States.1252
#
#attached base packages:
#        [1] stats     graphics  grDevices utils     datasets  methods
base
#>
#

PS: Also, I have read the help on aggregate, factor, as.factor, and several
other topics.  If I missed something, please let me know.
Some people like to reply to questions by telling the sender that R has
documentation.  Please don't.  The R help archives are littered with
reminders, friendly and otherwise, of R's documentation.

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to