Hi Mark: > However, if the dataframe contains non-unique rows (two rows with > exactly the same values in each column) then the unique function will > delete one of them and that may not be desirable.
In order to get information about equal rows between two dataframes without removing duplicated rows in each of them, it is possible to use sorting. For example n <- ncol(cars) cars1 <- cbind(cars[1:35, ], df="df1") cars2 <- cbind(cars[16:50, ], df="df2") cars.all <- rbind(cars1, cars2) # all cases together, column "df" indicates origin of each case row.names(cars.all) <- seq(nrow(cars.all)) cars.sorted <- cars.all[do.call(order, cars.all), ] # compute an index, which is the same for rows, which are equal except of the "df" component. index <- cumsum(1 - duplicated(cars.sorted[, 1:n])) # for each index of a unique row, compute the number of occurrences in both dataframes out <- table(index, cars.sorted$df) out[15:20, ] index df1 df2 15 1 0 16 1 1 17 2 2 18 1 1 19 1 1 20 1 1 This shows, for example, that the row with index 17 has 2 occurrences in both dataframes. These rows can be obtained using cars.sorted[index == 17, ] speed dist df 17 13 34 df1 18 13 34 df1 37 13 34 df2 38 13 34 df2 See also ?rle. Petr. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.