On Thu, Dec 16, 2010 at 01:02:29PM -0600, Mark Na wrote: > Hello, > > I have two dataframes DF1 and DF2 that should be identical but are not > (DF1 has some rows that aren't in DF2, and vice versa). I would like > to produce a new dataframe DF3 containing rows in DF1 that aren't in > DF2 (and similarly DF4 would contain rows in DF2 that aren't in DF1).
The function unique(DF) removes duplicated rows of DF and keeps the unique rows in the order of their first occurrence. So, if DF1 does not contain duplicated rows, then unique(rbind(DF1, DF2)) contains first DF1 and then the rows, which are unique to DF2, if there are any. The order of the rows in the result depends on the order of the original data frames and if DF2 contains several instances of a row, which is not in DF1, we get only the first instance of this row in the difference. #MAKE SOME DATA cars$id <- paste(cars$speed, cars$dist, sep="") #create unique ID field by pasting all columns together cars1 <- cars[1:35, ] cars2 <- cars[16:50, ] #EXTRACT UNIQUE ROWS cars1_unique <- cars1[cars1$id %in% setdiff(cars1$id, cars2$id), ] #rows unique to cars1 (i.e., not in cars2) cars2_unique <- cars2[cars2$id %in% setdiff(cars2$id, cars1$id), ] #rows unique to cars2 cars1_set <- unique(cars1) cars2_set <- unique(cars2) cars1_plus <- unique(rbind(cars1_set, cars2_set)) cars2_plus <- unique(rbind(cars2_set, cars1_set)) cars1_diff <- cars2_plus[ - seq(nrow(cars2_set)), ] cars2_diff <- cars1_plus[ - seq(nrow(cars1_set)), ] all(cars1_unique == cars1_diff) # [1] TRUE all(cars2_unique == cars2_diff) # [1] TRUE Petr Savicky. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.