Re: [R] Compare two dataframes

Petr Savicky Sat, 18 Dec 2010 01:01:16 -0800

Hi Mark:

> However, if the dataframe contains non-unique rows (two rows with
> exactly the same values in each column) then the unique function will
> delete one of them and that may not be desirable.


In order to get information about equal rows between two dataframes
without removing duplicated rows in each of them, it is possible to
use sorting. For example

  n <- ncol(cars)
  cars1 <- cbind(cars[1:35, ], df="df1")
  cars2 <- cbind(cars[16:50, ], df="df2")
  cars.all <- rbind(cars1, cars2) # all cases together, column "df" indicates 
origin of each case
  row.names(cars.all) <- seq(nrow(cars.all))
  cars.sorted <- cars.all[do.call(order, cars.all), ]
  # compute an index, which is the same for rows, which are equal except of the 
"df" component.
  index <- cumsum(1 - duplicated(cars.sorted[, 1:n]))
  # for each index of a unique row, compute the number of occurrences in both 
dataframes
  out <- table(index, cars.sorted$df)
  out[15:20, ]
     
  index df1 df2
     15   1   0
     16   1   1
     17   2   2
     18   1   1
     19   1   1
     20   1   1

This shows, for example, that the row with index 17 has 2 occurrences in both
dataframes. These rows can be obtained using

  cars.sorted[index == 17, ]

     speed dist  df
  17    13   34 df1
  18    13   34 df1
  37    13   34 df2
  38    13   34 df2

See also ?rle.

Petr.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Compare two dataframes

Reply via email to