Thanks so much for all this. The first solution is what I'm going with as I want the terms object to come along so that predict still works.
On Wed, Jul 27, 2016 at 12:28 PM, William Dunlap via R-devel < r-devel@r-project.org> wrote: > Another solution is to only save the parts of the model object that > interest you. As long as they don't include the formula (which is > what drags along the environment it was created in), you will > save space. E.g., > > tfun2 <- function(subset) { > junk <- 1:1e6 > list(subset=subset, lm(Sepal.Length ~ Sepal.Width, data=iris, > subset=subset)$coef) > } > > saveSize(tfun2(1:4)) > #[1] 152 > > > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > On Wed, Jul 27, 2016 at 11:19 AM, William Dunlap <wdun...@tibco.com> > wrote: > > > One way around this problem is to make a new environment whose > > parent environment is .GlobalEnv and which contains only what the > > the call to lm() requires and to compute lm() in that environment. > E.g., > > > > tfun1 <- function (subset) > > { > > junk <- 1:1e+06 > > env <- new.env(parent = globalenv()) > > env$subset <- subset > > with(env, lm(Sepal.Length ~ Sepal.Width, data = iris, subset = > subset)) > > } > > Then we get > > > saveSize(tfun1(1:4)) # see below for def. of saveSize > > [1] 910 > > instead of the 2129743 bytes in the save file when using the naive > method. > > > > saveSize <- function (object) { > > tf <- tempfile(fileext = ".RData") > > on.exit(unlink(tf)) > > save(object, file = tf) > > file.size(tf) > > } > > > > > > > > Bill Dunlap > > TIBCO Software > > wdunlap tibco.com > > > > On Wed, Jul 27, 2016 at 10:48 AM, Kenny Bell <km...@berkeley.edu> wrote: > > > >> In the below, I generate a model from an environment that isn't > >> .GlobalEnv with a large object that is unrelated to the model > >> generation. It seems to save the irrelevant object unnecessarily. In > >> my actual use case, I am running and saving many models in a loop that > >> each use a single large data.frame (that gets collapsed into a small > >> data.frame for estimation), so removing it isn't an option. > >> > >> In the case where the model exists in .GlobalEnv, everything is > >> peachy. So replicating whatever happens when saving the model that was > >> generated in .GlobalEnv at the return() stage of the function call > >> would fix this problem. > >> > >> I was referred to this list from r-bugs. First time r-devel poster. > >> > >> Hope this helps, > >> > >> Kendon > >> > >> ``` > >> tmp_fun <- function(x){ > >> iris_big <- lapply(1:10000, function(x) iris) > >> lm(Sepal.Length ~ Sepal.Width, data = iris) > >> } > >> > >> out <- tmp_fun(1) > >> object.size(out) > >> # 48008 > >> save(out, file = "tmp.RData", compress = FALSE) > >> file.size("tmp.RData") > >> # 57196752 - way too big > >> > >> # Works fine when in .GlobalEnv > >> iris_big <- lapply(1:10000, function(x) iris) > >> out <- lm(Sepal.Length ~ Sepal.Width, data = iris) > >> > >> object.size(out) > >> # 48008 > >> save(out, file = "tmp.RData", compress = FALSE) > >> file.size("tmp.RData") > >> # 16641 - good size. > >> ``` > >> > >> [[alternative HTML version deleted]] > >> > >> ______________________________________________ > >> R-devel@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-devel > >> > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel