On Sat, May 30, 2009 at 11:59 AM, Stavros Macrakis <macra...@alum.mit.edu>wrote:
> Since R is object-oriented, data frame set operations should be the natural > operations for their class. There are, I suppose, two natural ways: the > column-wise (variable-wise) and the row-wise (observation-wise) one. The > row-wise one seems more natural and more useful to me. > ... > > The row-wise interpretation makes sense in cases where observations with > the same values for all variables can be considered redundant. That seems > to me a much more useful interpretation. The union, intersection, and set > difference of two sets of observations would seem to all be highly useful. > Another argument for the row-wise interpretation: the `subset` function (also part of base) works that way on data frames. Interestingly, %in%/match appears to work neither row-wise nor column-wise: 1 %in% data.frame(a=1:3) # FALSE (would be true if row-wise) 1:3 %in% data.frame(a=1:3) # FALSE FALSE FALSE (would be true if column-wise) but simply treats the data frame as a *character* list: 1 %in% data.frame(a=2,b=1) # TRUE '1' %in% data.frame(a=2,b=1) # TRUE 1 %in% data.frame(a=2:3,b=1:2) # FALSE 1:3 %in% data.frame(a=2:4,b=1:3) # FALSE FALSE FALSE '1:3' %in% data.frame(a=2:4,b=1:3) # TRUE This specification is clearly documented in ? match, but I am mystified by it. Perhaps someone from R core can shed light on the rationale? -s [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel