On Sat, May 25, 2013 at 4:38 PM, Simon Urbanek <simon.urba...@r-project.org> wrote: > On May 25, 2013, at 3:48 PM, Henrik Bengtsson wrote: > >> Hi, >> >> in my packages/functions/code I tend to remove large temporary >> variables as soon as possible, e.g. large intermediate vectors used in >> iterations. I sometimes also have the habit of doing this to make it >> explicit in the source code when a temporary object is no longer >> needed. However, I did notice that this can add a noticeable overhead >> when the rest of the iteration step does not take that much time. >> >> Trying to speed this up, I first noticed that rm(list="a") is much >> faster than rm(a). While at it, I realized that for the purpose of >> keeping the memory footprint small, I can equally well reassign the >> variable the value of a small object (e.g. a <- NULL), which is >> significantly faster than using rm(). >> > > Yes, as you probably noticed rm() is a quite complex function because it has > to deal with different ways to specify input etc. > When you remove that overhead (by calling .Internal(remove("a", > parent.frame(), FALSE))), you get the same performance as the assignment. > If you really want to go overboard, you can define your own function: > > SEXP rm(SEXP x, SEXP rho) { setVar(x, R_UnboundValue, rho); return > R_NilValue; } > poof <- function(x) .Call(rm_C, substitute(x), parent.frame()) > > That will be faster than anything else (mainly because it avoids the trip > through strings as it can use the symbol directly).
Thanks for this one. This is useful - I did try to follow where .Internal(remove, ...), but got lost in the internal structures. Of course, I'd love to see such a function in 'base' itself. Having such a well defined and narrow function for removing a variable in the current environment may also be useful for 'codetools'/'R CMD check' such that code inspection can detect undefined variables in the case they used to be defined but later have been removed. Technically rm() allows for that too, but I can see how such a task quickly gets complicated when arguments 'list', 'envir' and 'inherits' are involved. > > But as Bill noted - it practice I'd recommend using either local() or > functions to control the scope - using rm() or assignments seems too > error-prone to me. I didn't mention it, but another reason I use rm() a lot is actually so R can catch my programming mistakes (I'm maintaining 100,000+ lines of code), i.e. the opposite to being error prone. For instance, by doing rm(tmp) as soon as possible, R will give me the run-time error "Error: object 'tmp' not found" in case I use it by mistake later on. As said above, potential the codetools/'R CMD check' will be able to detect this already at check time [above]. With tmp <- NULL I'll loose a bit of this protection, although another run-time error is likely to occur a bit later. Using local()/local functions are obviously alternatives for the above. Thanks both (and sorry about the game - though it was an entertaining one) /Henrik > > Cheers, > Simon > > > >> SOME BENCHMARKS: >> A toy example imitating an iterative algorithm with "large" temporary >> objects. >> >> x <- matrix(rnorm(100e6), ncol=10e3) >> >> t1 <- system.time(for (k in 1:ncol(x)) { >> a <- x[,k] >> colSum <- sum(a) >> rm(a) # Not needed anymore >> b <- x[k,] >> rowSum <- sum(b) >> rm(b) # Not needed anymore >> }) >> >> t2 <- system.time(for (k in 1:ncol(x)) { >> a <- x[,k] >> colSum <- sum(a) >> rm(list="a") # Not needed anymore >> b <- x[k,] >> rowSum <- sum(b) >> rm(list="b") # Not needed anymore >> }) >> >> t3 <- system.time(for (k in 1:ncol(x)) { >> a <- x[,k] >> colSum <- sum(a) >> a <- NULL # Not needed anymore >> b <- x[k,] >> rowSum <- sum(b) >> b <- NULL # Not needed anymore >> }) >> >>> t1 >> user system elapsed >> 8.03 0.00 8.08 >>> t1/t2 >> user system elapsed >> 1.322900 0.000000 1.320261 >>> t1/t3 >> user system elapsed >> 1.715812 0.000000 1.662551 >> >> >> Is there a reason why I shouldn't assign NULL instead of using rm()? >> As far as I understand it, the garbage collector will be equally >> efficient cleaning out the previous object when using rm(a) or a <- >> NULL. Is there anything else I'm overlooking? Am I adding overhead >> somewhere else? >> >> /Henrik >> >> >> PS. With the above toy example one can obviously be a bit smarter by using: >> >> t4 <- system.time({for (k in 1:ncol(x)) { >> a <- x[,k] >> colSum <- sum(a) >> a <- x[k,] >> rowSum <- sum(a) >> } >> rm(list="a") >> }) >> >> but that's not my point. >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> >> > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel