David, 'assign' is slower than '<-': ## median expr
## 1 0.1440 X <- letters ## 2 0.4420 .Internal(assign("X", letters, e, F)) ## 3 1.1820 e[["X"]] <- letters ## 4 1.2570 e$X <- letters ## 5 1.8380 assign("X", letters, envir = e, inherits = F) ## 6 1.9415 assign("X", letters, e, inherits = F) (micro seconds, 500 times, see http://rpubs.com/setempler/46568) --- Two questions: 'X<-letters' is the fastest since it does not need to change the environment from 'benchmark' to 'e'? Why is the call to '.Internal' faster than '[[<-' as compared to the 'get'/'[[' functions/benchmark of Winston? thanks, s On 4 December 2014 at 15:24, Lorenz, David <lor...@usgs.gov> wrote: > All, > So that suggests that .GlobalEnv[["X"]] is more efficient than get("X", > pos=1L). What about .GlobalEnv[["X"]] <- value, compared to assign("X", > value)? > Dave > > On Wed, Dec 3, 2014 at 3:30 PM, Peter Haverty <haverty.pe...@gene.com> > wrote: > >> Thanks Winston! I'm amazed that "[[" beats calling the .Internal >> directly. I guess the difference between .Primitive vs. .Internal is >> pretty significant for things on this time scale. >> >> NULL meaning NULL and NULL meaning undefined would lead to the same path >> for much of my code. I'll be swapping out many exists and get calls later >> today. Thanks! >> >> I do still think it would be very useful to have some way to discriminate >> the two NULL cases. I'm reminded of how perl does the same thing. It's >> been a while, but it was something like >> >> if (defined(x{'c'})) { print x{'c'}; } # This is still two lookups, but it >> has the "defined" concept. >> >> or maybe even >> >> if (defined( foo = x{'c'} ) ) { print foo; } >> >> >> Thanks again for the timings! >> >> >> Pete >> >> ____________________ >> Peter M. Haverty, Ph.D. >> Genentech, Inc. >> phave...@gene.com >> >> On Wed, Dec 3, 2014 at 12:48 PM, Winston Chang <winstoncha...@gmail.com> >> wrote: >> >> > I've looked at related speed issues in the past, and have a couple >> > related points to add. (I've put the info below at >> > http://rpubs.com/wch/46428.) >> > >> > There's a significant amount of overhead just from calling the R >> > function get(). This is true even when you skip the pos argument and >> > provide envir. For example, if you call get(), it takes much more time >> > than .Internal(get()), which is what get() does. >> > >> > If you already know that the object exists in an environment, it's >> > faster to use e$x, and slightly faster still to use e[["x"]]: >> > >> > e <- new.env() >> > e$a <- 1 >> > >> > # Accessing objects in environments >> > microbenchmark( >> > get("a", e, inherits = FALSE), >> > get("a", envir = e, inherits = FALSE), >> > .Internal(get("a", e, "any", FALSE)), >> > e$a, >> > e[["a"]], >> > .Primitive("[[")(e, "a"), >> > >> > unit = "us" >> > ) >> > #> median name >> > #> 1 1.0300 get("a", e, inherits = FALSE) >> > #> 2 0.9425 get("a", envir = e, inherits = FALSE) >> > #> 3 0.3080 .Internal(get("a", e, "any", FALSE)) >> > #> 4 0.2305 e$a >> > #> 5 0.1740 e[["a"]] >> > #> 6 0.2905 .Primitive("[[")(e, "a") >> > >> > >> > A similar thing happens with exists(): the R function wrapper adds >> > significant overhead on top of .Internal(exists()). It's also faster >> > to use $ and [[, then test for NULL, but of course this won't >> > distinguish between objects that don't exist, and those that do exist >> > but have a NULL value: >> > >> > # Test for existence of `a` (which exists), and `c` (which doesn't) >> > microbenchmark( >> > exists('a', e, inherits = FALSE), >> > exists('a', envir = e, inherits = FALSE), >> > .Internal(exists('a', e, 'any', FALSE)), >> > 'a' %in% ls(e, all.names = TRUE), >> > is.null(e[['a']]), >> > is.null(e$a), >> > >> > exists('c', e, inherits = FALSE), >> > exists('c', envir = e, inherits = FALSE), >> > .Internal(exists('c', e, 'any', FALSE)), >> > 'c' %in% ls(e, all.names = TRUE), >> > is.null(e[['c']]), >> > is.null(e$c), >> > >> > unit = "us" >> > ) >> > #> median name >> > #> 1 1.2015 exists("a", e, inherits = FALSE) >> > #> 2 1.0545 exists("a", envir = e, inherits = FALSE) >> > #> 3 0.3615 .Internal(exists("a", e, "any", FALSE)) >> > #> 4 7.6345 "a" %in% ls(e, all.names = TRUE) >> > #> 5 0.3055 is.null(e[["a"]]) >> > #> 6 0.3270 is.null(e$a) >> > #> 7 1.1890 exists("c", e, inherits = FALSE) >> > #> 8 1.0370 exists("c", envir = e, inherits = FALSE) >> > #> 9 0.3465 .Internal(exists("c", e, "any", FALSE)) >> > #> 10 7.5475 "c" %in% ls(e, all.names = TRUE) >> > #> 11 0.2675 is.null(e[["c"]]) >> > #> 12 0.3010 is.null(e$c) >> > >> > >> > -Winston >> > >> > On Tue, Dec 2, 2014 at 8:46 PM, Peter Haverty <haverty.pe...@gene.com> >> > wrote: >> > > Hi All, >> > > >> > > I've been looking into speeding up the loading of packages that use a >> lot >> > > of S4. After profiling I noticed the "exists" function accounts for a >> > > surprising fraction of the time. I have some thoughts about speeding >> up >> > > exists (below). More to the point of this post, Martin Mächler noted >> that >> > > 'exists' and 'get' are often used in conjunction. Both functions are >> > > different usages of the do_get C function, so it's a pity to run that >> > twice. >> > > >> > > "get" gives an error when a symbol is not found, so you can't just do a >> > > 'get'. With R's C library, one might do >> > > >> > > SEXP x = findVarInFrame3(symbol,env); >> > > if (x != R_UnboundValue) { >> > > // do stuff with x >> > > } >> > > >> > > It would be very convenient to have something like this at the R level. >> > We >> > > don't want to do any tryCatch stuff or to add args to get (That would >> > kill >> > > any speed advantage. The overhead for handling redundant args accounts >> > for >> > > 30% of the time used by "exists"). Michael Lawrence and I worked out >> > that >> > > we need a function that returns either the desired object, or something >> > > that represents R_UnboundValue. We also need a very cheap way to check >> if >> > > something equals this new R_UnboundValue. This might look like >> > > >> > > if (defined(x <- fetch(symbol, env))) { >> > > do_stuff_with_x(x) >> > > } >> > > >> > > A few more thoughts about "exists": >> > > >> > > Moving the bit of R in the exists function to C saves 10% of the time. >> > > Dropping the redundant pos and frame args entirely saves 30% of the >> time >> > > used by this function. I suggest that the arguments of both get and >> > > exists should >> > > be simplified to (x, envir, mode, inherits). The existing C code >> handles >> > > numeric, character, and environment input for where. The arg frame is >> > > rarely used (0/128 exists calls in the methods package). Users that >> need >> > to >> > > can call sys.frame themselves. get already lacks a frame argument and >> the >> > > manpage for exists notes that envir is only there for backwards >> > > compatibility. Let's deprecate the extra args in exists and get and >> > perhaps >> > > move the extra argument handling to C in the interim. Similarly, the >> > > "assign" function does nothing with the "immediate" argument. >> > > >> > > I'd be interested to hear if there is any support for a "fetch"-like >> > > function (and/or deprecating some unused arguments). >> > > >> > > All the best, >> > > Pete >> > > >> > > >> > > >> > > Pete >> > > >> > > ____________________ >> > > Peter M. Haverty, Ph.D. >> > > Genentech, Inc. >> > > phave...@gene.com >> > > >> > > [[alternative HTML version deleted]] >> > > >> > > >> > > ______________________________________________ >> > > R-devel@r-project.org mailing list >> > > https://stat.ethz.ch/mailman/listinfo/r-devel >> > > >> > >> >> [[alternative HTML version deleted]] >> >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel