Bert, > The efficiency gains are due to vectorization and the use of more > efficient primitives. None of this may matter of course, but it seemed > worth mentioning.
thanks very much! the varieties of code, and disparities of performance, are truly wonderful. Rui's point that what works better for small n is not necessarily what will work better for large n is important to keep in [my] mind. as a "so-far" summary, here are some timings. the relevant code is below. ---- my apply user system elapsed 8.397 0.124 8.531 Bert's !duplicated user system elapsed 2.367 0.000 2.370 Bert's x[,2]>x[,1] user system elapsed 1.052 0.000 1.054 my a.d.f(unique(cbind(do.call))) user system elapsed 3.909 0.000 3.914 Eric Berger's unique(...pmin...pmax) user system elapsed 0.848 0.000 0.850 Eric Berger's transmuting tibble... user system elapsed 0.986 0.000 0.988 Kimmo Elo's [OP] mutating paste user system elapsed 52.079 0.000 52.136 Rui Barradas' sort-based user system elapsed 42.327 0.080 42.450 ---- cheers, Greg ---- n <- 1000 x <- expand.grid(Source = 1:n, Target = 1:n) cat("my apply\n") system.time({ y <- apply(x, 1, function(y) return (c(A=min(y), B=max(y)))) unique(t(y))}) # user system elapsed # 5.075 0.034 5.109 cat("Bert's !duplicated\n") system.time({ x[!duplicated(cbind(do.call(pmin, x), do.call(pmax, x))), ] }) # user system elapsed # 1.340 0.013 1.353 # Still more efficient and still returning a data frame is: cat("Bert's x[,2]>x[,1]\n") system.time({ w <- x[, 2] > x[,1] x[w, ] <- x[w, 2:1] unique(x)}) # user system elapsed # 0.693 0.011 0.703 cat("my a.d.f(unique(cbind(do.call)))\n") system.time({ as.data.frame(unique(cbind(A=do.call(pmin,x), B=do.call(pmax,x)))) }) cat("Eric Berger's unique(...pmin...pmax)\n") system.time({ unique(data.frame(V1=pmin(x$Source,x$Target), V2=pmax(x$Source,x$Target))) }) cat("Eric Berger's transmuting tibble...\n") require(dplyr) xt<-tibble(x) system.time({ xt %>% transmute( a=pmin(Source,Target), b=pmax(Source,Target)) %>% unique() %>% rename(Source=a, Target=b) }) cat("Kimmo Elo's [OP] mutating paste\n") system.time({ xt %>% mutate(pair=mapply(function(x,y) paste0(sort(c(x,y)),collapse="-"), Source, Target)) %>% distinct(pair, .keep_all = T) %>% mutate(Source=sapply(pair, function(x) unlist(strsplit(x, split="-"))[1]), Target=sapply(pair, function(x) unlist(strsplit(x, split="-"))[2])) %>% select(-pair) }) cat("Rui Barradas' sort-based\n") system.time({ apply(x, 1, sort) |> t() |> unique() }) ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.