I guess all we need to do is to detect whether a function would try to
access a free variable in the user's workspace, and warn/error if so.
It looks like CodeDepends could do that. I could try to come up with an
implementation. I guess we would add CodeDepends as an optional
dependency for BiocParallel, and only do the checks if CodeDepends is
available.
On Sun Nov 3 17:10:45 2013, Gabriel Becker wrote:
Henrik,
See https://github.com/duncantl/CodeDepends (as used by used by
https://github.com/gmbecker/RCacheSuite). It will identify necessarily
defined symbols (input variables) for code that is not doing certain
tricks (eg get(), mixing data.frame columns and gobal variables in
formulas, etc ).
Tierney's codetools package also does things along these lines but
there are some situations where it has trouble. I can give more detail
if desired.
~G
On Sun, Nov 3, 2013 at 3:04 PM, Ryan <r...@thompsonclan.org
<mailto:r...@thompsonclan.org>> wrote:
Another potential easy step we can do is that if FUN function in
the user's workspace, we automatically export that function under
the same name in the children. This would make recursive functions
just work, but it might be a bit too magical.
On 11/3/13, 2:38 PM, Ryan wrote:
Here's an easy thing we can add to BiocParallel in the short
term. The following code defines a wrapper function
"withBPExtraErrorText" that simply appends an additional
message to the end of any error that looks like it is about a
missing variable. We could wrap every evaluation in a similar
tryCatch to at least provide a more informative error message
when a subprocess has a missing variable.
-Ryan
withBPExtraErrorText <- function(expr) {
tryCatch({
expr
}, simpleError = function(err) {
if (grepl("^object '(.*)' not found$", err$message,
perl=TRUE)) {
## It is an error due to a variable not found.
err$message <- paste0(err$message, ". Maybe you
forgot to export this variable from the main R session using
\"bpexport\"?")
}
stop(err)
})
}
x <- 5
## Succeeds
withBPExtraErrorText(x)
## Fails with more informative error message
withBPExtraErrorText(y)
On Sun Nov 3 14:01:48 2013, Henrik Bengtsson wrote:
On Sun, Nov 3, 2013 at 1:29 PM, Michael Lawrence
<lawrence.mich...@gene.com
<mailto:lawrence.mich...@gene.com>> wrote:
An analog to clusterExport is a good idea. To make it
even easier, we could
have a dynamic environment based on object tables that
would catch missing
symbols and download them from the parent thread. But
maybe there's some
benefit to being explicit?
A first step to fully automate this would be to provide
some (opt
in/out) mechanism for code inspection and warn about
non-defined
objects (cf. 'R CMD check'). That is of course major
work, but will
certainly spare the community/users 1000's of hours in
troubleshooting
and the mailing lists from "why doesn't my parallel code
not work"
messages. Such protection may be better suited for the
'parallel'
package though. Unfortunately, it's beyond my skills/time
to pull
such a thing together.
/Henrik
Michael
On Sun, Nov 3, 2013 at 12:39 PM, Henrik Bengtsson
<h...@biostat.ucsf.edu <mailto:h...@biostat.ucsf.edu>>
wrote:
Hi,
in BiocParallel, is there a suggested (or planned)
best standards for
making *locally* assigned variables (e.g.
functions) available to the
applied function when it runs in a separate R
process (which will be
the most common use case)? I understand that
avoid local variables
should be avoided and it's preferred to put as
mush as possible in
packages, but that's not always possible or very
convenient.
EXAMPLE:
library('BiocParallel')
library('BatchJobs')
# Here I pick a recursive functions to make the
problem a bit harder, i.e.
# the function needs to call itself ("itself" =
see below)
fib <- function(n=0) {
if (n < 0) stop("Invalid 'n': ", n)
if (n == 0 || n == 1) return(1)
fib(n-2) + fib(n-1)
}
# Executing in the current R session
cluster.functions <-
makeClusterFunctionsInteractiv__e()
bpParams <-
BatchJobsParam(cluster.__functions=cluster.functions)
register(bpParams)
values <- bplapply(0:9, FUN=fib)
## SubmitJobs
|+++++++++++++++++++++++++++++__+++++| 100% (00:00:00)
## Waiting [S:0 R:0 D:10 E:0]
|+++++++++++++++++++| 100% (00:00:00)
# Executing in a separate R process, where fib()
is not defined
# (not specific to BiocParallel)
cluster.functions <- makeClusterFunctionsLocal()
bpParams <-
BatchJobsParam(cluster.__functions=cluster.functions)
register(bpParams)
values <- bplapply(0:9, FUN=fib)
## SubmitJobs
|+++++++++++++++++++++++++++++__+++++| 100% (00:00:00)
## Waiting [S:0 R:0 D:10 E:0]
|+++++++++++++++++++| 100% (00:00:00)
Error in LastError$store(results = results,
is.error = !ok, throw.error =
TRUE)
:
Errors occurred during execution. First error
message:
Error in FUN(...): could not find function "fib"
[...]
# The following illustrates that the solution is
not always
straightforward.
# (not specific to BiocParallel; must have been
discussed previously)
values <- bplapply(0:9, FUN=function(n, fib) {
fib(n)
}, fib=fib)
Error in LastError$store(results = results,
is.error = !ok,
throw.error = TRUE) :
Errors occurred during execution. First error
message:
Error in fib(n): could not find function "fib"
[...]
# Workaround; make fib() aware of itself
# (this is something the user need to do, and
would be very
# hard for BiocParallel et al. to automate. BTW,
should all
# recursive functions be implemented this way?).
fib <- function(n=0) {
if (n < 0) stop("Invalid 'n': ", n)
if (n == 0 || n == 1) return(1)
fib <- sys.function() # Make function aware of
itself
fib(n-2) + fib(n-1)
}
values <- bplapply(0:9, FUN=function(n, fib) {
fib(n)
}, fib=fib)
WISHLIST:
Considering the above recursive issue solved, a
slightly more explicit
and standardized solution is then:
values <- bplapply(0:9, FUN=function(n,
BPGLOBALS=NULL) {
for (name in names(BPGLOBALS)) assign(name,
BPGLOBALS[[name]])
fib(n)
}, BPGLOBALS=list(fib=fib))
Could the above be generalized into something as
neat as:
bpExport("fib")
values <- bplapply(0:9, FUN=function(n) {
BiocParallel::bpImport("fib")
fib(n)
})
or ideally just (analogously to
parallel::clusterExport()):
bpExport("fib")
values <- bplapply(0:9, FUN=fib)
/Henrik
_________________________________________________
Bioc-devel@r-project.org
<mailto:Bioc-devel@r-project.org> mailing list
https://stat.ethz.ch/mailman/__listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
_________________________________________________
Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
mailing list
https://stat.ethz.ch/mailman/__listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
_________________________________________________
Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> mailing
list
https://stat.ethz.ch/mailman/__listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
--
Gabriel Becker
Graduate Student
Statistics Department
University of California, Davis
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel