Just to be clear: This works: > set.seed(100) > dat=data.frame( + x1=sample(c(NA,'m','f'), 100, replace=TRUE), + x2=sample(c(NA, 1:10), 100, replace=TRUE), + x3=sample(c(NA,letters[1:5]), 100, replace=TRUE), + x4=sample(c(NA,T,F), 100, replace=TRUE), + y=sample(c(rep(NA,5), rnorm(95)))) > for(j in 1:4) + dat[,j] = factor(dat[,j], exclude=NULL) > sum(dat$y, na.rm=T) [1] 0.0815244116598 > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y) [1] 0.0815244116598 >
It's just that I don't want to do that conversion on my real data, because of other complications... I wish you could tell aggregate to use NAs in the categorical data. On Fri, Feb 4, 2011 at 6:18 PM, Gene Leynes <gleyne...@gmail.com<gleynes%...@gmail.com> > wrote: > Ista, > > Thank you again. > > I had figured that out... and was crafting another message when you > replied. > > The NAs do come though on the variable that is being aggregated, > However, they do not come through on the categorical variable(s). > > The aggregate function must be converting the data frame variables to > factors, with the default "omit=NA" parameter. > > The help on "aggregate" says: > na.action A function which indicates what should happen when the data > contain NA values. > The default is to ignore missing values in the given > variables. > By "data" it must only refer to the aggregated variable, and not the > categorical variables. I thought it referred to both, because I thought it > referred to the "data" argument, which is the underlying data frame. > > I think the proper way to accomplish this would be to recast my x > (categorical) variables as factors. This is not feasible for me due to > other complications. > Also, (imho) the help should be more clear about what the na.action > modifies. > > So, unless someone has a better idea, I guess I'm out of luck? > > > > On Fri, Feb 4, 2011 at 6:05 PM, Ista Zahn <iz...@psych.rochester.edu>wrote: > >> Hi, >> >> On Fri, Feb 4, 2011 at 6:33 PM, Gene Leynes >> <gleyne...@gmail.com<gleynes%...@gmail.com>> >> wrote: >> > Thank you both for the thoughtful (and funny) replies. >> > >> > I agree with both of you that sum is the one picking up aggregate. >> Although >> > I didn't mention it, I did realize that in the first place. >> > Also, thank you Phil for pointing out that aggregate only accepts a >> formula >> > value in more recent versions! I actually thought that was an older >> > feature, but I must be thinking of other functions. >> > >> > I still don't see why these two values are not the same! >> > >> > It seems like a bug to me >> >> No, not a bug (see below). >> >> > >> >> set.seed(100) >> >> dat=data.frame( >> > + x1=sample(c(NA,'m','f'), 100, replace=TRUE), >> > + x2=sample(c(NA, 1:10), 100, replace=TRUE), >> > + x3=sample(c(NA,letters[1:5]), 100, replace=TRUE), >> > + x4=sample(c(NA,T,F), 100, replace=TRUE), >> > + y=sample(c(rep(NA,5), rnorm(95)))) >> >> sum(dat$y, na.rm=T) >> > [1] 0.0815244116598 >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.pass, >> na.rm=T)$y) >> > [1] -4.45087666247 >> >> >> >> Because in the first one you are only removing missing data in dat$y. >> In the second one you are removeing all rows that contain missing data >> in any of the columns. >> >> all.equal(sum(na.omit(dat)$y), sum(aggregate(y~x1+x2+x3+x4, data=dat, >> sum, na.action=na.pass, na.rm=T)$y)) >> [1] TRUE >> >> Best, >> Ista >> >> > >> > >> > >> > On Fri, Feb 4, 2011 at 4:18 PM, Ista Zahn <iz...@psych.rochester.edu> >> wrote: >> >> >> >> Sorry, I didn't see Phil's reply, which is better than mine anyway. >> >> >> >> -Ista >> >> >> >> On Fri, Feb 4, 2011 at 5:16 PM, Ista Zahn <iz...@psych.rochester.edu> >> >> wrote: >> >> > Hi, >> >> > >> >> > Please see ?na.action >> >> > >> >> > (just kidding!) >> >> > >> >> > So it seems to me the problem is that you are passing na.rm to the >> sum >> >> > function. So there is no missing data for the na.action argument to >> >> > operate on! >> >> > >> >> > Compare >> >> > >> >> > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.fail)$y) >> >> > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.pass)$y) >> >> > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.omit)$y) >> >> > >> >> > >> >> > Best, >> >> > Ista >> >> > >> >> > On Fri, Feb 4, 2011 at 4:07 PM, Gene Leynes >> >> > <gleyne...@gmail.com<gleynes%...@gmail.com>> >> wrote: >> >> >> Can someone please tell me what is up with na.action in aggregate? >> >> >> >> >> >> My (somewhat) reproducible example: >> >> >> (I say somewhat because some lines wouldn't run in a separate >> session, >> >> >> more >> >> >> below) >> >> >> >> >> >> set.seed(100) >> >> >> dat=data.frame( >> >> >> x1=sample(c(NA,'m','f'), 100, replace=TRUE), >> >> >> x2=sample(c(NA, 1:10), 100, replace=TRUE), >> >> >> x3=sample(c(NA,letters[1:5]), 100, replace=TRUE), >> >> >> x4=sample(c(NA,T,F), 100, replace=TRUE), >> >> >> y=sample(c(rep(NA,5), rnorm(95)))) >> >> >> dat >> >> >> ## The total from dat: >> >> >> sum(dat$y, na.rm=T) >> >> >> ## The total from aggregate: >> >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x) >> >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y) ## <--- >> This >> >> >> line >> >> >> gave an error in a separate R instance >> >> >> ## The aggregate formula is excluding NA >> >> >> >> >> >> ## So, let's try to include NAs >> >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, >> >> >> na.action='na.pass')$y) >> >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, >> >> >> na.action=na.pass)$y) >> >> >> ## The aggregate formula is STILL excluding NA >> >> >> ## In fact, the formula doesn't seem to notice the na.action >> >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='foo >> man >> >> >> chew')$y) >> >> >> ## Hmmmm... that error surprised me (since the previous two things >> ran) >> >> >> >> >> >> ## So, let's try to change the global options >> >> >> ## (not mentioned in the help, but after reading the help >> >> >> ## 100 times, I thought I would go above and beyond to avoid >> >> >> ## any r list flames from people complaining >> >> >> ## that I didn't read the help... but that's a separate topic) >> >> >> options(na.action ="na.pass") >> >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x) >> >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y) >> >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, >> >> >> na.action='na.pass')$y) >> >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, >> >> >> na.action=na.pass)$y) >> >> >> ## (NAs are still omitted) >> >> >> >> >> >> ## Even more frustrating... >> >> >> ## Why don't any of these work??? >> >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, >> na.action='na.pass')$x) >> >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.pass)$x) >> >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, >> na.action='na.omit')$x) >> >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.omit)$x) >> >> >> >> >> >> >> >> >> ## This does work, but in my real data set, I want NA to really be >> NA >> >> >> for(j in 1:4) >> >> >> dat[is.na(dat[,j]),j] = 'NA' >> >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x) >> >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y) >> >> >> >> >> >> >> >> >> ## My first session info >> >> >> # >> >> >> #> sessionInfo() >> >> >> #R version 2.12.0 (2010-10-15) >> >> >> #Platform: i386-pc-mingw32/i386 (32-bit) >> >> >> # >> >> >> #locale: >> >> >> # [1] LC_COLLATE=English_United States.1252 >> >> >> #[2] LC_CTYPE=English_United States.1252 >> >> >> #[3] LC_MONETARY=English_United States.1252 >> >> >> #[4] LC_NUMERIC=C >> >> >> #[5] LC_TIME=English_United States.1252 >> >> >> # >> >> >> #attached base packages: >> >> >> # [1] stats graphics grDevices utils datasets >> methods >> >> >> base >> >> >> # >> >> >> #other attached packages: >> >> >> # [1] plyr_1.2.1 zoo_1.6-4 gdata_2.8.1 rj_0.5.0-5 >> >> >> # >> >> >> #loaded via a namespace (and not attached): >> >> >> # [1] grid_2.12.0 gtools_2.6.2 lattice_0.19-13 >> >> >> rJava_0.8-8 >> >> >> #[5] tools_2.12.0 >> >> >> >> >> >> >> >> >> >> >> >> I tried running that example in a different version of R, with and I >> >> >> got >> >> >> completely different results >> >> >> >> >> >> The other version of R wouldn't recognize the formula at all.. >> >> >> >> >> >> My other version of R: >> >> >> >> >> >> # My second session info >> >> >> #> sessionInfo() >> >> >> #R version 2.10.1 (2009-12-14) >> >> >> #i386-pc-mingw32 >> >> >> # >> >> >> #locale: >> >> >> # [1] LC_COLLATE=English_United States.1252 >> >> >> #[2] LC_CTYPE=English_United States.1252 >> >> >> #[3] LC_MONETARY=English_United States.1252 >> >> >> #[4] LC_NUMERIC=C >> >> >> #[5] LC_TIME=English_United States.1252 >> >> >> # >> >> >> #attached base packages: >> >> >> # [1] stats graphics grDevices utils datasets >> methods >> >> >> base >> >> >> #> >> >> >> # >> >> >> >> >> >> PS: Also, I have read the help on aggregate, factor, as.factor, and >> >> >> several >> >> >> other topics. If I missed something, please let me know. >> >> >> Some people like to reply to questions by telling the sender that R >> has >> >> >> documentation. Please don't. The R help archives are littered with >> >> >> reminders, friendly and otherwise, of R's documentation. >> >> >> >> >> >> [[alternative HTML version deleted]] >> >> >> >> >> >> ______________________________________________ >> >> >> R-help@r-project.org mailing list >> >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> >> PLEASE do read the posting guide >> >> >> http://www.R-project.org/posting-guide.html >> >> >> and provide commented, minimal, self-contained, reproducible code. >> >> >> >> >> > >> >> > >> >> > >> >> > -- >> >> > Ista Zahn >> >> > Graduate student >> >> > University of Rochester >> >> > Department of Clinical and Social Psychology >> >> > http://yourpsyche.org >> >> > >> >> >> >> >> >> >> >> -- >> >> Ista Zahn >> >> Graduate student >> >> University of Rochester >> >> Department of Clinical and Social Psychology >> >> http://yourpsyche.org >> > >> > >> >> >> >> -- >> Ista Zahn >> Graduate student >> University of Rochester >> Department of Clinical and Social Psychology >> http://yourpsyche.org >> > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.