Hi, in BiocParallel, is there a suggested (or planned) best standards for making *locally* assigned variables (e.g. functions) available to the applied function when it runs in a separate R process (which will be the most common use case)? I understand that avoid local variables should be avoided and it's preferred to put as mush as possible in packages, but that's not always possible or very convenient.
EXAMPLE: library('BiocParallel') library('BatchJobs') # Here I pick a recursive functions to make the problem a bit harder, i.e. # the function needs to call itself ("itself" = see below) fib <- function(n=0) { if (n < 0) stop("Invalid 'n': ", n) if (n == 0 || n == 1) return(1) fib(n-2) + fib(n-1) } # Executing in the current R session cluster.functions <- makeClusterFunctionsInteractive() bpParams <- BatchJobsParam(cluster.functions=cluster.functions) register(bpParams) values <- bplapply(0:9, FUN=fib) ## SubmitJobs |++++++++++++++++++++++++++++++++++| 100% (00:00:00) ## Waiting [S:0 R:0 D:10 E:0] |+++++++++++++++++++| 100% (00:00:00) # Executing in a separate R process, where fib() is not defined # (not specific to BiocParallel) cluster.functions <- makeClusterFunctionsLocal() bpParams <- BatchJobsParam(cluster.functions=cluster.functions) register(bpParams) values <- bplapply(0:9, FUN=fib) ## SubmitJobs |++++++++++++++++++++++++++++++++++| 100% (00:00:00) ## Waiting [S:0 R:0 D:10 E:0] |+++++++++++++++++++| 100% (00:00:00) Error in LastError$store(results = results, is.error = !ok, throw.error = TRUE) : Errors occurred during execution. First error message: Error in FUN(...): could not find function "fib" [...] # The following illustrates that the solution is not always straightforward. # (not specific to BiocParallel; must have been discussed previously) values <- bplapply(0:9, FUN=function(n, fib) { fib(n) }, fib=fib) Error in LastError$store(results = results, is.error = !ok, throw.error = TRUE) : Errors occurred during execution. First error message: Error in fib(n): could not find function "fib" [...] # Workaround; make fib() aware of itself # (this is something the user need to do, and would be very # hard for BiocParallel et al. to automate. BTW, should all # recursive functions be implemented this way?). fib <- function(n=0) { if (n < 0) stop("Invalid 'n': ", n) if (n == 0 || n == 1) return(1) fib <- sys.function() # Make function aware of itself fib(n-2) + fib(n-1) } values <- bplapply(0:9, FUN=function(n, fib) { fib(n) }, fib=fib) WISHLIST: Considering the above recursive issue solved, a slightly more explicit and standardized solution is then: values <- bplapply(0:9, FUN=function(n, BPGLOBALS=NULL) { for (name in names(BPGLOBALS)) assign(name, BPGLOBALS[[name]]) fib(n) }, BPGLOBALS=list(fib=fib)) Could the above be generalized into something as neat as: bpExport("fib") values <- bplapply(0:9, FUN=function(n) { BiocParallel::bpImport("fib") fib(n) }) or ideally just (analogously to parallel::clusterExport()): bpExport("fib") values <- bplapply(0:9, FUN=fib) /Henrik _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel