I would suggest using the microbenchmark package to do the time comparison. This will run each a bunch of times for a more meaningful comparison.
One possible reason for the difference is the number of missing values in your data (along with the number of columns). Consider the difference in the following results: > x <- c(1,2,NA) > x[x==1] [1] 1 NA > x[which(x==1)] [1] 1 On Sat, Oct 10, 2020 at 5:25 PM 1/k^c <kchambe...@gmail.com> wrote: > > Hi R-helpers, > > Does anyone know why adding which() makes the select call more > efficient than just using logical selection in a dataframe? Doesn't > which() technically add another conversion/function call on top of the > logical selection? Here is a reproducible example with a slight > difference in timing. > > # Surrogate data - the timing here isn't interesting > urltext <- paste("https://drive.google.com/", > "uc?id=1AZ-s1EgZXs4M_XF3YYEaKjjMMvRQ7", > "-h8&export=download", sep="") > download.file(url=urltext, destfile="tempfile.csv") # download file first > dat <- read.csv("tempfile.csv", stringsAsFactors = FALSE, header=TRUE, > nrows=2.5e6) # read the file; 'nrows' is a slight > # overestimate > dat <- dat[,1:3] # select just the first 3 columns > head(dat, 10) # print the first 10 rows > > # Select using which() as the final step ~ 90ms total time on my macbook air > system.time( > head( > dat[which(dat$gender2=="other"),],), > gcFirst=TRUE) > > # Select skipping which() ~130ms total time > system.time( > head( > dat[dat$gender2=="other", ]), > gcFirst=TRUE) > > Now I would think that the second one without which() would be more > efficient. However, every time I run these, the first version, with > which() is more efficient by about 20ms of system time and 20ms of > user time. Does anyone know why this is? > > Cheers! > Keith > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.