Note that if you force a garbage collection each iteration the times are more stable. However, on the average it is faster to let the garbage collector decide when to leap into action.
mb_gc <- microbenchmark::microbenchmark(gc(), { x <- as.list(sin(1:5e5)); x <- unlist(x) / cos(1:5e5) ; sum(x) }, times=1000, control=list(order="inorder")) with(mb_gc, plot(time[expr!="gc()"])) with(mb_gc, quantile(1e-6*time[expr!="gc()"], c(0, .5, .75, .9, .95, .99, 1))) # 0% 50% 75% 90% 95% 99% 100% # 59.33450 61.33954 63.43457 66.23331 68.93746 74.45629 158.09799 Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, Aug 22, 2017 at 9:26 AM, William Dunlap <wdun...@tibco.com> wrote: > The large value for maximum time may be due to garbage collection, which > happens periodically. E.g., try the following, where the > unlist(as.list()) creates a lot of garbage. I get a very large time every > 102 or 51 iterations and a moderately large time more often > > mb <- microbenchmark::microbenchmark({ x <- as.list(sin(1:5e5)); x <- > unlist(x) / cos(1:5e5) ; sum(x) }, times=1000) > plot(mb$time) > quantile(mb$time * 1e-6, c(0, .5, .75, .90, .95, .99, 1)) > # 0% 50% 75% 90% 95% 99% 100% > # 59.04446 82.15453 102.17522 180.36986 187.52667 233.42062 249.33970 > diff(which(mb$time > quantile(mb$time, .99))) > # [1] 102 51 102 102 102 102 102 102 51 > diff(which(mb$time > quantile(mb$time, .95))) > # [1] 6 41 4 47 4 40 7 4 47 4 33 14 4 47 4 47 4 47 4 47 4 47 4 > 6 41 > #[26] 4 6 7 9 25 4 47 4 47 4 47 4 22 25 4 33 14 4 6 41 4 47 4 > 22 > > > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > On Tue, Aug 22, 2017 at 5:53 AM, <raphael.fel...@agroscope.admin.ch> > wrote: > >> Dear all >> >> I was thinking about efficient reading data into R and tried several ways >> to test if load(file.Rdata) or readRDS(file.rds) is faster. The files >> file.Rdata and file.rds contain the same data, the first created with >> save(d, ' file.Rdata', compress=F) and the second with saveRDS(d, ' >> file.rds', compress=F). >> >> First I used the function microbenchmark() and was a astonished about the >> max value of the output. >> >> FIRST TEST: >> > library(microbenchmark) >> > microbenchmark( >> + n <- readRDS('file.rds'), >> + load('file.Rdata') >> + ) >> Unit: milliseconds >> expr min lq >> mean median uq >> max neval >> n <- readRDS(fl1) 106.5956 109.6457 237.3844 >> 117.8956 141.9921 10934.162 100 >> load(fl2) 295.0654 301.8162 >> 335.6266 308.3757 319.6965 1915.706 >> 100 >> >> It looks like the max value is an outlier. >> >> So I tried: >> SECOND TEST: >> > sapply(1:10, function(x) system.time(n <- readRDS('file.rds'))[3]) >> elapsed elapsed elapsed >> elapsed elapsed elapsed elapsed >> elapsed elapsed elapsed >> 10.50 0.11 0.11 >> 0.11 0.10 0.11 >> 0.11 0.11 0.12 >> 0.12 >> > sapply(1:10, function(x) system.time(load'flie.Rdata'))[3]) >> elapsed elapsed elapsed >> elapsed elapsed elapsed elapsed >> elapsed elapsed elapsed >> 1.86 0.29 0.31 >> 0.30 0.30 0.31 >> 0.30 0.29 0.31 >> 0.30 >> >> Which confirmed my suspicion; the first time loading the data takes much >> longer than the following times. I suspect that this has something to do >> how the data is assigned and that R doesn't has to 'fully' read the data, >> if it is read the second time. >> >> So the question remains, how can I make a realistic benchmark test? From >> the first test I would conclude that reading the *.rds file is faster. But >> this holds only for a large number of neval. If I set times = 1 then >> reading the *.Rdata would be faster (as also indicated by the second test). >> >> Thanks for any help or comments. >> >> Kind regards >> >> Raphael >> ------------------------------------------------------------ >> ------------------------ >> Raphael Felber, PhD >> Scientific Officer, Climate & Air Pollution >> >> Federal Department of Economic Affairs, >> Education and Research EAER >> Agroscope >> Research Division, Agroecology and Environment >> >> Reckenholzstrasse 191, CH-8046 Zürich >> Phone +41 58 468 75 11 <+41%2058%20468%2075%2011> >> Fax +41 58 468 72 01 <+41%2058%20468%2072%2001> >> raphael.fel...@agroscope.admin.ch<mailto:raphael.felber@ >> agroscope.admin.ch> >> www.agroscope.ch<http://www.agroscope.ch/> >> >> >> [[alternative HTML version deleted]] >> >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posti >> ng-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.