[R] UNIX diff function

Dennis Fisher Wed, 13 Jul 2011 10:15:57 -0700

Colleagues,

(R: 2.13.0; OS X)


I often receive sequential datasets in which there are new rows interposed 
between existing rows.  For example:
        SET1 <- data.frame(list(LETTERS=LETTERS[c(1:4, 6:10)], NUMBERS=c(1:4, 
6:10)))
        SET2 <- data.frame(list(LETTERS=LETTERS[1:10], NUMBERS=1:10))

> SET1
  LETTERS NUMBERS
1       A       1
2       B       2
3       C       3
4       D       4
5       F       6
6       G       7
7       H       8
8       I       9
9       J      10

> SET2
   LETTERS NUMBERS
1        A       1
2        B       2
3        C       3
4        D       4
5        E       5
6        F       6
7        G       7
8        H       8
9        I       9
10       J      10

As you can see, the row containing E and 5 was inserted into the second set.  
The UNIX diff command identifies the differences quite readily.  Obviously, the 
R diff function does not do this.  However, one kluge that I use is to paste 
together all the entries in each row, then perform a setdiff on the two 
resulting vectors.  Assuming that no rows are duplicated (which would true in 
my data), my approach works but is it cumbersome.  

I suspect that someone on this board has thought of a more clever approach to 
this (or perhaps some function already exists).  Any help would be appreciated.

Thanks.

Dennis


Dennis Fisher MD
P < (The "P Less Than" Company)
Phone: 1-866-PLessThan (1-866-753-7784)
Fax: 1-866-PLessThan (1-866-753-7784)
www.PLessThan.com

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] UNIX diff function

Reply via email to