Hello, I've made Petr's solution a bit more general
Petr Savicky wrote > > On Mon, Feb 27, 2012 at 07:10:57PM +0100, Arnaud Gaboury wrote: >> No, but I tried your way too. >> >> In fact, the only three unique rows are these ones: >> >> Product Price Nbr.Lots >> Cocoa 2440 5 >> Cocoa 2450 1 >> Cocoa 2440 6 >> >> Here is a dirty working trick I found : >> >> > df<-merge(exportfile,reported,all.y=T) >> > df1<-merge(exportfile,reported) >> > dff1<-do.call(paste,df) >> > dff<-do.call(paste,df) >> > dff1<-do.call(paste,df1) >> > df[!dff %in% dff1,] >> Product Price Nbr.Lots >> 3 Cocoa 2440 5 >> 4 Cocoa 2450 1 >> >> >> My two problems are : I do think it is not so a clean code, then I won't >> know by advance which of my two df will have the greates dimension (I can >> add some lines to deal with it, but again, seems very heavy). > > Hi. > > Try the following. > > setdiffDF <- function(A, B) > { > A[!duplicated(rbind(B, A))[nrow(B) + 1:nrow(A)], ] > } > > df1 <- setdiffDF(reported, exportfile) > df2 <- setdiffDF(exportfile, reported) > rbind(df1, df2) > > I obtained > > Product Price Nbr.Lots > 3 Cocoa 2440 5 > 4 Cocoa 2450 1 > 31 Cocoa 2440 6 > > Is this correct? I see the row > > Cocoa 2440.00 6 > > only in exportfile and not in reported. > > The trick with paste() is not a bad idea. A variant of > it is used also in the base function duplicated.matrix(), > since it contains > > apply(x, MARGIN, function(x) paste(x, collapse = "\r")) > > If speed is critical, then possibly the paste() trick > written for the whole columns, for example > > paste(df[[1]], df[[2]], df[[3]], sep="\r") > > and then setdiff() can be better. > > Hope this helps. > > Petr Savicky. > > ______________________________________________ > R-help@ mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > It produces the symmetric difference for vectors, matrices, data.frames and (so-so tested) lists. #----------------------------- # First the set difference `%-%` <- function(x, y) UseMethod("%-%") `%-%.default` <- function(x, y){ f <- function(A, B) !duplicated(c(B, A))[length(B) + 1:length(A)] ix <- f(x, y) x[ix] } `%-%.matrix` <- `%-%.data.frame` <- function(x, y){ f <- function(A, B) !duplicated(rbind(B, A))[nrow(B) + 1:nrow(A)] ix <- f(x, y) x[ix, ] } `%-%.list` <- function(x, y){ f <- function(A, B) if(class(A) == class(B)) A %-% B lapply(y, function(Y) lapply(x, f, Y)) } # Then the set symmetric difference symdiff <- function(x, y) UseMethod("symdiff") symdiff.default <- function(x, y) c(x %-% y, y %-% x) symdiff.matrix <- symdiff.data.frame <- function(x, y){ xclass <- class(x) res <- rbind(x %-% y, y %-% x) class(res) <- xclass res } symdiff.list <- function(x, y){ f <- function(A, B) if(class(A) == class(B)) symdiff(A, B) lapply(y, function(Y) lapply(x, f, Y)) } # Test it with data.frames first (the OP data) reported %-% exportfile exportfile %-% reported symdiff(reported, exportfile) symdiff(exportfile, reported) #----------------------------- # And some other data types x <- 1:5 y <- 3:8 x %-% y y %-% x symdiff(x, y) symdiff(y, x) X <- list(a=x, rp=reported) Y <- list(b=y, ef=exportfile) X %-% Y Y %-% X symdiff(X, Y) symdiff(Y, X) P.S. This question seems to pop-up repeatedly Rui Barradas -- View this message in context: http://r.789695.n4.nabble.com/compare-two-data-frames-of-different-dimensions-and-only-keep-unique-rows-tp4425379p4426607.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.