Re: [R] Compare two dataframes

Petr Savicky Fri, 17 Dec 2010 00:28:50 -0800

On Thu, Dec 16, 2010 at 01:02:29PM -0600, Mark Na wrote:
> Hello,
> 
> I have two dataframes DF1 and DF2 that should be identical but are not
> (DF1 has some rows that aren't in DF2, and vice versa). I would like
> to produce a new dataframe DF3 containing rows in DF1 that aren't in
> DF2 (and similarly DF4 would contain rows in DF2 that aren't in DF1).


The function unique(DF) removes duplicated rows of DF and keeps the unique
rows in the order of their first occurrence. So, if DF1 does not contain
duplicated rows, then unique(rbind(DF1, DF2)) contains first DF1 and
then the rows, which are unique to DF2, if there are any. The order of
the rows in the result depends on the order of the original data frames
and if DF2 contains several instances of a row, which is not in DF1, we
get only the first instance of this row in the difference.

  #MAKE SOME DATA
  cars$id <- paste(cars$speed, cars$dist, sep="") #create unique ID field by 
pasting all columns together
  cars1 <- cars[1:35, ]
  cars2 <- cars[16:50, ]
 
  #EXTRACT UNIQUE ROWS
  cars1_unique <- cars1[cars1$id %in% setdiff(cars1$id, cars2$id), ] #rows 
unique to cars1 (i.e., not in cars2)
  cars2_unique <- cars2[cars2$id %in% setdiff(cars2$id, cars1$id), ] #rows 
unique to cars2
 
  cars1_set <- unique(cars1)
  cars2_set <- unique(cars2)
 
  cars1_plus <- unique(rbind(cars1_set, cars2_set))
  cars2_plus <- unique(rbind(cars2_set, cars1_set))
 
  cars1_diff <- cars2_plus[ - seq(nrow(cars2_set)), ]
  cars2_diff <- cars1_plus[ - seq(nrow(cars1_set)), ]

  all(cars1_unique == cars1_diff) # [1] TRUE
  all(cars2_unique == cars2_diff) # [1] TRUE

Petr Savicky.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Compare two dataframes

Reply via email to