Since R is object-oriented, data frame set operations should be the natural operations for their class. There are, I suppose, two natural ways: the column-wise (variable-wise) and the row-wise (observation-wise) one. The row-wise one seems more natural and more useful to me.
The current implementation is column-wise, though it is inconsistent in its return class (the man page defines return modes, but is silent on return classes): class(union(df1,df2)) [1] "list" > class(intersect(df1,df2)) [1] "data.frame" > class(setdiff(df1,df2)) [1] "data.frame" Unlike other cases, I don't think this inconsistency brings any user convenience (though it may reflect programmer convenience). The column-wise interpretation makes sense in cases where variables with the same vector value (ignoring the variable name) can be considered redundant. I suppose there are cases where that could be useful, though it does seem hazardous. The row-wise interpretation makes sense in cases where observations with the same values for all variables can be considered redundant. That seems to me a much more useful interpretation. The union, intersection, and set difference of two sets of observations would seem to all be highly useful. -s On Sat, May 30, 2009 at 10:21 AM, G. Jay Kerns <gke...@ysu.edu> wrote: > On Sat, May 30, 2009 at 8:50 AM, Stavros Macrakis <macra...@alum.mit.edu> > wrote: > > It seems to me that, abstractly, a dataframe is just as > > straightforwardly a sequence of tuples/observations as a vector is a > > sequence of scalars. R's convention is that a 1-vector represents a > > scalar, and similarly, a 1-dataframe can represent a tuple (though it > > can also be represented as a list). Of course, a dataframe can *also* > > be interpreted as a list of vectors. > > > > Just as a sequence of scalars can be interpreted as a set of scalars > > by the order- and repetition-ignoring homomophism, so can a sequence > > of tuples. It seems to me natural that set operations should follow > > that interpretation. > > > > -s > > > After a good night's sleep, the documentation says clearly that > setdiff() operates on two vectors (of the same mode), so my message > would be an example of "garbage in, garbage out". > > It would be nice if there were an error thrown, but surely there are > more mission critical problems than this one. > > Thanks anyway. > Jay > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel