[R] aggregate.formula implicitly removes rows containing NA

Dickison, Daniel Tue, 11 Jan 2011 14:43:05 -0800

The documentation for `aggregate` makes it sound like aggregate.formula should 
behave identically to aggregate.data.frame (apart from the way the parameters 
are passed).  But it looks like aggregate.formula is quietly removing rows 
where any of the "output" variables (those on the LHS of the formula) are NA.  
This differs from how aggregate.data.frame works.  Is this expected behavior?


Here are a couple of examples:

> d <- data.frame(a=rep(1:2, each=2),
+                 b=c(1,2,NA,3))
> aggregate(d["b"], d["a"], mean)
  a   b
1 1 1.5
2 2  NA
> aggregate(b ~ a, d, mean)
  a   b
1 1 1.5
2 2 3.0

It's removing whole rows even if just one of the columns is NA, i.e.:

> d <- data.frame(a=rep(1:2, each=2),
+                 b=c(1,2,NA,3),
+                 c=c(NA,2,3,NA))
> aggregate(cbind(b,c) ~ a, d, mean)
  a b c
1 1 2 2

Daniel
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] aggregate.formula implicitly removes rows containing NA

Reply via email to