I guess all we need to do is to detect whether a function would try to access a free variable in the user's workspace, and warn/error if so. It looks like CodeDepends could do that. I could try to come up with an implementation. I guess we would add CodeDepends as an optional dependency for BiocParallel, and only do the checks if CodeDepends is available.

On Sun Nov  3 17:10:45 2013, Gabriel Becker wrote:
Henrik,

See https://github.com/duncantl/CodeDepends (as used by used by
https://github.com/gmbecker/RCacheSuite). It will identify necessarily
defined symbols (input variables) for code that is not doing certain
tricks (eg get(), mixing data.frame columns and gobal variables in
formulas, etc ).

Tierney's codetools package also does things along these lines but
there are some situations where it has trouble. I can give more detail
if desired.

~G


On Sun, Nov 3, 2013 at 3:04 PM, Ryan <r...@thompsonclan.org
<mailto:r...@thompsonclan.org>> wrote:

    Another potential easy step we can do is that if FUN function in
    the user's workspace, we automatically export that function under
    the same name in the children. This would make recursive functions
    just work, but it might be a bit too magical.


    On 11/3/13, 2:38 PM, Ryan wrote:

        Here's an easy thing we can add to BiocParallel in the short
        term. The following code defines a wrapper function
        "withBPExtraErrorText" that simply appends an additional
        message to the end of any error that looks like it is about a
        missing variable. We could wrap every evaluation in a similar
        tryCatch to at least provide a more informative error message
        when a subprocess has a missing variable.

        -Ryan

        withBPExtraErrorText <- function(expr) {
           tryCatch({
               expr
           }, simpleError = function(err) {
               if (grepl("^object '(.*)' not found$", err$message,
        perl=TRUE)) {
                   ## It is an error due to a variable not found.
                   err$message <- paste0(err$message, ". Maybe you
        forgot to export this variable from the main R session using
        \"bpexport\"?")
               }
               stop(err)
           })
        }

        x <- 5

        ## Succeeds
        withBPExtraErrorText(x)

        ## Fails with more informative error message
        withBPExtraErrorText(y)



        On Sun Nov  3 14:01:48 2013, Henrik Bengtsson wrote:

            On Sun, Nov 3, 2013 at 1:29 PM, Michael Lawrence
            <lawrence.mich...@gene.com
            <mailto:lawrence.mich...@gene.com>> wrote:

                An analog to clusterExport is a good idea. To make it
                even easier, we could
                have a dynamic environment based on object tables that
                would catch missing
                symbols and download them from the parent thread. But
                maybe there's some
                benefit to being explicit?


            A first step to fully automate this would be to provide
            some (opt
            in/out) mechanism for code inspection and warn about
            non-defined
            objects (cf. 'R CMD check').  That is of course major
            work, but will
            certainly spare the community/users 1000's of hours in
            troubleshooting
            and the mailing lists from "why doesn't my parallel code
            not work"
            messages.  Such protection may be better suited for the
            'parallel'
            package though.  Unfortunately, it's beyond my skills/time
            to pull
            such a thing together.

            /Henrik


                Michael


                On Sun, Nov 3, 2013 at 12:39 PM, Henrik Bengtsson
                <h...@biostat.ucsf.edu <mailto:h...@biostat.ucsf.edu>>
                wrote:


                    Hi,

                    in BiocParallel, is there a suggested (or planned)
                    best standards for
                    making *locally* assigned variables (e.g.
                    functions) available to the
                    applied function when it runs in a separate R
                    process (which will be
                    the most common use case)?  I understand that
                    avoid local variables
                    should be avoided and it's preferred to put as
                    mush as possible in
                    packages, but that's not always possible or very
                    convenient.

                    EXAMPLE:

                    library('BiocParallel')
                    library('BatchJobs')

                    # Here I pick a recursive functions to make the
                    problem a bit harder, i.e.
                    # the function needs to call itself ("itself" =
                    see below)
                    fib <- function(n=0) {
                       if (n < 0) stop("Invalid 'n': ", n)
                       if (n == 0 || n == 1) return(1)
                       fib(n-2) + fib(n-1)
                    }

                    # Executing in the current R session
                    cluster.functions <-
                    makeClusterFunctionsInteractiv__e()
                    bpParams <-
                    BatchJobsParam(cluster.__functions=cluster.functions)
                    register(bpParams)
                    values <- bplapply(0:9, FUN=fib)
                    ## SubmitJobs
                    |+++++++++++++++++++++++++++++__+++++| 100% (00:00:00)
                    ## Waiting [S:0 R:0 D:10 E:0]
                    |+++++++++++++++++++| 100% (00:00:00)


                    # Executing in a separate R process, where fib()
                    is not defined
                    # (not specific to BiocParallel)
                    cluster.functions <- makeClusterFunctionsLocal()
                    bpParams <-
                    BatchJobsParam(cluster.__functions=cluster.functions)
                    register(bpParams)
                    values <- bplapply(0:9, FUN=fib)
                    ## SubmitJobs
                    |+++++++++++++++++++++++++++++__+++++| 100% (00:00:00)
                    ## Waiting [S:0 R:0 D:10 E:0]
                    |+++++++++++++++++++| 100% (00:00:00)
                    Error in LastError$store(results = results,
                    is.error = !ok, throw.error =
                    TRUE)
                    :
                       Errors occurred during execution. First error
                    message:
                    Error in FUN(...): could not find function "fib"
                    [...]


                    # The following illustrates that the solution is
                    not always
                    straightforward.
                    # (not specific to BiocParallel; must have been
                    discussed previously)
                    values <- bplapply(0:9, FUN=function(n, fib) {
                       fib(n)
                    }, fib=fib)
                    Error in LastError$store(results = results,
                    is.error = !ok,
                    throw.error = TRUE) :
                       Errors occurred during execution. First error
                    message:
                    Error in fib(n): could not find function "fib"
                    [...]

                    # Workaround; make fib() aware of itself
                    # (this is something the user need to do, and
                    would be very
                    #  hard for BiocParallel et al. to automate.  BTW,
                    should all
                    #  recursive functions be implemented this way?).
                    fib <- function(n=0) {
                       if (n < 0) stop("Invalid 'n': ", n)
                       if (n == 0 || n == 1) return(1)
                       fib <- sys.function() # Make function aware of
                    itself
                       fib(n-2) + fib(n-1)
                    }
                    values <- bplapply(0:9, FUN=function(n, fib) {
                       fib(n)
                    }, fib=fib)


                    WISHLIST:
                    Considering the above recursive issue solved, a
                    slightly more explicit
                    and standardized solution is then:

                    values <- bplapply(0:9, FUN=function(n,
                    BPGLOBALS=NULL) {
                       for (name in names(BPGLOBALS)) assign(name,
                    BPGLOBALS[[name]])
                       fib(n)
                    }, BPGLOBALS=list(fib=fib))

                    Could the above be generalized into something as
                    neat as:

                    bpExport("fib")
                    values <- bplapply(0:9, FUN=function(n) {
                       BiocParallel::bpImport("fib")
                       fib(n)
                    })

                    or ideally just (analogously to
                    parallel::clusterExport()):

                    bpExport("fib")
                    values <- bplapply(0:9, FUN=fib)

                    /Henrik

                    _________________________________________________
                    Bioc-devel@r-project.org
                    <mailto:Bioc-devel@r-project.org> mailing list
                    https://stat.ethz.ch/mailman/__listinfo/bioc-devel
                    <https://stat.ethz.ch/mailman/listinfo/bioc-devel>




            _________________________________________________
            Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
            mailing list
            https://stat.ethz.ch/mailman/__listinfo/bioc-devel
            <https://stat.ethz.ch/mailman/listinfo/bioc-devel>


    _________________________________________________
    Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> mailing
    list
    https://stat.ethz.ch/mailman/__listinfo/bioc-devel
    <https://stat.ethz.ch/mailman/listinfo/bioc-devel>




--
Gabriel Becker
Graduate Student
Statistics Department
University of California, Davis

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to