Hi, I just want to do a follow up this very simple fix/correction/speedup/cleanup of the base::which() function. Here is a diff:
diff src/library/base/R/which.R which.R 21c21 < wh <- seq_along(x)[ll <- x & !is.na(x)] --- > wh <- seq_along(x)[x & !is.na(x)] 25c25 < names(wh) <- names(x)[ll] --- > names(wh) <- names(x)[wh] FYI, the 'll' variable is not used elsewhere. I've been going through this modifications several times and I cannot see any side effects. Could someone of R core please commit this? BTW, when one report diff:s, do you prefer to get it with or without context information, e.g. -C 3? /Henrik On Fri, Jul 11, 2008 at 8:57 AM, Charles C. Berry <[EMAIL PROTECTED]> wrote: > On Thu, 10 Jul 2008, Henrik Bengtsson wrote: > >> Hi, >> >> by replacing 'll' with 'wh' in the source code for base::which() one >> gets ~20% speed up for *named logical vectors*. > > > The amount of speedup depends on how sparse the TRUE values are. > > When the proportion of TRUEs gets small the speedup is more than twofold on > my macbook. For high proportions of TRUE, the speedup is more like the 20% > you cite. > > HTH, > > Chuck > >> >> CURRENT CODE: >> >> which <- function(x, arr.ind = FALSE) >> { >> if(!is.logical(x)) >> stop("argument to 'which' is not logical") >> wh <- seq_along(x)[ll <- x & !is.na(x)] >> m <- length(wh) >> dl <- dim(x) >> if (is.null(dl) || !arr.ind) { >> names(wh) <- names(x)[ll] >> } >> ... >> wh; >> } >> >> SUGGESTED CODE: (Remove 'll' and use 'wh') >> >> which2 <- function(x, arr.ind = FALSE) >> { >> if(!is.logical(x)) >> stop("argument to 'which' is not logical") >> wh <- seq_along(x)[x & !is.na(x)] >> m <- length(wh) >> dl <- dim(x) >> if (is.null(dl) || !arr.ind) { >> names(wh) <- names(x)[wh] >> } >> ... >> wh; >> } >> >> That's all. >> >> BENCHMARKING: >> >> # To measure both in same environment >> which1 <- base::which; >> environment(which1) <- globalenv(); # Needed? >> >> N <- 1e6; >> set.seed(0xbeef); >> x <- sample(c(TRUE, FALSE), size=N, replace=TRUE); >> names(x) <- seq_along(x); >> B <- 10; >> t1 <- system.time({ for (bb in 1:B) idxs1 <- which1(x); }); >> t2 <- system.time({ for (bb in 1:B) idxs2 <- which2(x); }); >> stopifnot(identical(idxs1, idxs2)); >> print(t1/t2); >> # Fair benchmarking >> t2 <- system.time({ for (bb in 1:B) idxs2 <- which2(x); }); >> t1 <- system.time({ for (bb in 1:B) idxs1 <- which1(x); }); >> print(t1/t2); >> ## user system elapsed >> ## 1.283186 1.052632 1.250000 >> >> You get similar results if you put for loop outside the system.time() >> call (and sum up the timings). >> >> Cheers >> >> Henrik >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > Charles C. Berry (858) 534-2098 > Dept of Family/Preventive > Medicine > E mailto:[EMAIL PROTECTED] UC San Diego > http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 > > > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel