Re: [R] Large file size while persisting rpart model to disk

Terry Therneau Wed, 04 Feb 2009 06:36:56 -0800

  In R, functions remember their entire calling chain.  The good thing about 
this is that they can find variables further up in the nested context, i.e.,
    mfun <- function(x) { x+y}
will look for 'y' in the function that called myfun, then in the function that
called the function, .... on up and then through the search() list.  This makes
life easier for certain things such as minimizers.


  The bad thing is that to make this work R has to remember all of the 
variables 
that were available up the entire chain, and 99-100% of them aren't necessary.  
(Because of constructs like get(varname) a parser can't read the code to decide 
what might be needed).  

  This is an issue with embedded functions.  I recently noticed an extreme case 
of it in the pspline routine and made changes to fix it.  The short version
        pspline(x, ...other args) {
                some computations to define an X matrix, which can be large
                define a print function
                ...
                return(X, printfun, other stuff)
                }
It's even worse in the frailty functions, where X can be VERY large.
The print function's environment wanted to 'remember' all of the temporary work 
that went into defining X, plus X itself and so would be huge.  My solution was 
add the line
        environment(printfun) <- new.env(parent=baseenv())
which marks the function as not needing anything from the local environment, 
only the base R definitions.  This would probably be a good addition to rpart, 
but I need to look closer.
   My first cut was to use emptyenv(), but that wasn't so smart.  It leaves 
everything undefined, like "+" for instance. :-)
   
        Terry Therneau

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Large file size while persisting rpart model to disk

Reply via email to