Hello R-help, I've noticed that my 'parallel' jobs take too much memory to store and transfer to the cluster workers. I've managed to trace it to the following:
# `payload` is being written to the cluster worker. # The function FUN had been created as a closure inside my package: payload$data$args$FUN # function (l, ...) # withCallingHandlers(fun(l$x, ...), error = .wraperr(l$name)) # <bytecode: 0x5644a9f08a90> # <environment: 0x5644aa841ad8> # The function seems to bring a lot of captured data with it. e <- environment(payload$data$args$FUN) length(serialize(e, NULL)) # [1] 738202878 parent.env(e) # <environment: namespace:mypackage> # The parent environment has a name, so it all must be right here. # What is it? ls(e, all.names = TRUE) # [1] "fun" length(serialize(e$fun, NULL)) # [1] 317 # The only object in the environment is small! # Where is the 700 megabytes of data? length(serialize(e, NULL)) # [1] 536 length(serialize(payload$data$args$FUN, NULL)) # [1] 1722 And once I've observed `fun`, the environment becomes very small and now can be serialized in a very compact manner. I managed to work around it by forcing the promise and explicitly putting `fun` in a small environment when constructing the closure: .wrapfun <- function(fun) { e <- new.env(parent = loadNamespace('mypackage')) e$fun <- fun # NOTE: a naive return(function(...)) could serialize to 700 # megabytes due to `fun` seemingly being a promise (?). Once the # promise is resolved, suddenly `fun` is much more compact. ret <- function(l, ...) withCallingHandlers( fun(l$x, ...), error = .wraperr(l$name) ) environment(ret) <- e ret } Is this analysis correct? Could a simple f <- force(fun) have sufficed? Where can I read more about this type of problems? If this really is due to promises, what would be the downsides of forcing them during serialization? -- Best regards, Ivan ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.