Re: [Rd] setdiff bizarre (was: odd behavior out of setdiff)

Stavros Macrakis Tue, 02 Jun 2009 08:15:23 -0700

On Sat, May 30, 2009 at 11:59 AM, Stavros Macrakis <macra...@alum.mit.edu>wrote:


> Since R is object-oriented, data frame set operations should be the natural
> operations for their class.  There are, I suppose, two natural ways: the
> column-wise (variable-wise) and the row-wise (observation-wise) one.  The
> row-wise one seems more natural and more useful to me.
> ...
>
> The row-wise interpretation makes sense in cases where observations with
> the same values for all variables can be considered redundant.  That seems
> to me a much more useful interpretation.  The union, intersection, and set
> difference of two sets of observations would seem to all be highly useful.
>

Another argument for the row-wise interpretation: the `subset` function
(also part of base) works that way on data frames.

Interestingly, %in%/match appears to work neither row-wise nor column-wise:

     1 %in% data.frame(a=1:3)  # FALSE  (would be true if row-wise)
     1:3 %in% data.frame(a=1:3) # FALSE FALSE FALSE (would be true if
column-wise)

but simply treats the data frame as a *character* list:

     1 %in% data.frame(a=2,b=1)  # TRUE
     '1' %in% data.frame(a=2,b=1)  # TRUE
     1 %in% data.frame(a=2:3,b=1:2) # FALSE
     1:3 %in% data.frame(a=2:4,b=1:3)  # FALSE FALSE FALSE
     '1:3' %in% data.frame(a=2:4,b=1:3)  # TRUE

This specification is clearly documented in ? match, but I am mystified by
it.  Perhaps someone from R core can shed light on the rationale?

          -s

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] setdiff bizarre (was: odd behavior out of setdiff)

Reply via email to