Gabriel,
Thanks for the clarification. I was avoiding depending on CodeDepends
because I'm fairly certain that a BioC package can't depend on a package
that isn't in either CRAN or Bioconductor. Since you point out that the
librarySymbols code doesn't depend on any other part of the package, I
think it would be fine to copy it into BiocParallel and use it to check
functions for external dependencies, if that's what you're suggesting.
Of course, we would add a comment noting that once CodeDepends makes it
into CRAN, we should switch over to using that.
Side note 1: If we're talking about doing sanity checks on code, we
could also check for any usage of non-local assignment ("<<-"), since we
know that will have no effect in the subprocess, and the user might not
expect that if they are not familiar with multi-process parallelism.
Side note 2: Your original link gave a 404 error because it had the word
"Note" appended to it. Removing this gave a valid link:
https://github.com/duncantl/CodeDepends/blob/forCRAN_0.3.5/R/librarySymbols.R
-Ryan
On 11/4/13, 12:13 PM, Gabriel Becker wrote:
> Ryan,
>
> I agree that in some sense it is a different problem, but my point is
> with a different approach we can easily answer both. The code I posted
> returns a named character vector of symbol names with package name
> being the name.
>
> This makes it a trivial lookup to determine both a) what symbols
> aren't available in any of the packages and b) what packages provide
> the remaining required symbols. No extra work required.
>
> You do have to give it a list of packages to check, but it is easy to
> write a wrapper that automatically passes it all currently attached
> packages if desired (a combination of search() and gsub() would be a
> quick and dirty way to do this).
>
> All that said, I'm simply trying to help. If you guys don't want to
> use my code/approach that is your perogative as I'm not currently
> working on BiocParallel myself.
>
> ~G
>
>
>
>
> On Mon, Nov 4, 2013 at 11:54 AM, Ryan Thompson <r...@thompsonclan.org
> <mailto:r...@thompsonclan.org>> wrote:
>
> The code that I wrote intentionally avoids checking for package
> variables, since I consider that a separate problem. Package
> variables can be provided to the child by leading the package,
> whereas user-defined variables must be serialized in the parent
> and sent to the child.
>
> I think I could fairly easily adapt the same code to return a list
> of all packages that a function depends on.
>
> -Ryan
>
> On Nov 4, 2013 11:35 AM, "Michael Lawrence"
> <lawrence.mich...@gene.com <mailto:lawrence.mich...@gene.com>> wrote:
>
> The dynamic nature of R limits the extent of these checks. But
> as Ryan has
> noted, a simple sanity check goes a long way. If what he has
> done could be
> extended to the rest of the search path (people always forget
> to attach
> packages), I think we've hit the 80% with 20%. Got a 404 on
> that URL btw.
>
> Michael
>
>
> On Mon, Nov 4, 2013 at 11:05 AM, Gabriel Becker
> <gmbec...@ucdavis.edu <mailto:gmbec...@ucdavis.edu>>wrote:
>
> > Hey guys,
> >
> > Here is code that I have written which resolves library
> names into a full
> > list of symbols:
> >
> >
>
> https://github.com/duncantl/CodeDepends/blob/forCRAN_0.3.5/R/librarySymbols.RNote
> > this does not require that the packages actually be loaded
> at the time
> > of the check, and does not load them (or rather, it loads
> them but does not
> > attach them, so no searchpath muddying occurs). You do need
> a list of
> > packages to check though (it adds the base ones
> automatically). It handles
> > dependency and could be easily extended to handle suggests
> as well I think.
> >
> > When CodeDepends gets pushed to cran (not my call and not
> high on my
> > priority list to push for currently) it will actually do
> exactly what you
> > want. (the forCRAN_0.3.5 branch already does and I believe it is
> > documented, so you could use devtools to install it now).
> >
> > As a side note, I'm not sure that existence of a symbol is
> sufficient (it
> > certainly is necessary). What about situations where the
> symbol exists but
> > is stale compared to the value in the parent? Are we sure
> that can never
> > happen?
> >
> > ~G
> >
> >
> > On Mon, Nov 4, 2013 at 7:29 AM, Michel Lang
> <michell...@gmail.com <mailto:michell...@gmail.com>> wrote:
> >
> > > You might want to consider using Recall() for recursion
> which should
> > solve
> > > this. Determining the required variables using heuristics
> as codetools
> > will
> > > probably lead to some confusion when using functions which
> include calls
> > > to, e.g., with():
> > >
> > > f = function() {
> > > with(iris, Sepal.Length + Sepal.Width)
> > > }
> > > codetools:::findGlobals(f)
> > >
> > > I would suggest to write up some documentation on what the
> function's
> > > environment contains and how to to define variables
> accordingly - or why
> > it
> > > can generally be considered a good idea to pass everything
> essential as
> > an
> > > argument. Nevertheless a "bpExport" function would be a
> good addition for
> > > some rare corner cases in my opinion.
> > >
> > > Michel
> > >
> > >
> > > 2013/11/3 Henrik Bengtsson <h...@biostat.ucsf.edu
> <mailto:h...@biostat.ucsf.edu>>
> > >
> > > > Hi,
> > > >
> > > > in BiocParallel, is there a suggested (or planned) best
> standards for
> > > > making *locally* assigned variables (e.g. functions)
> available to the
> > > > applied function when it runs in a separate R process
> (which will be
> > > > the most common use case)? I understand that avoid
> local variables
> > > > should be avoided and it's preferred to put as mush as
> possible in
> > > > packages, but that's not always possible or very convenient.
> > > >
> > > > EXAMPLE:
> > > >
> > > > library('BiocParallel')
> > > > library('BatchJobs')
> > > >
> > > > # Here I pick a recursive functions to make the problem
> a bit harder,
> > > i.e.
> > > > # the function needs to call itself ("itself" = see below)
> > > > fib <- function(n=0) {
> > > > if (n < 0) stop("Invalid 'n': ", n)
> > > > if (n == 0 || n == 1) return(1)
> > > > fib(n-2) + fib(n-1)
> > > > }
> > > >
> > > > # Executing in the current R session
> > > > cluster.functions <- makeClusterFunctionsInteractive()
> > > > bpParams <-
> BatchJobsParam(cluster.functions=cluster.functions)
> > > > register(bpParams)
> > > > values <- bplapply(0:9, FUN=fib)
> > > > ## SubmitJobs |++++++++++++++++++++++++++++++++++| 100%
> (00:00:00)
> > > > ## Waiting [S:0 R:0 D:10 E:0] |+++++++++++++++++++| 100%
> (00:00:00)
> > > >
> > > >
> > > > # Executing in a separate R process, where fib() is not
> defined
> > > > # (not specific to BiocParallel)
> > > > cluster.functions <- makeClusterFunctionsLocal()
> > > > bpParams <-
> BatchJobsParam(cluster.functions=cluster.functions)
> > > > register(bpParams)
> > > > values <- bplapply(0:9, FUN=fib)
> > > > ## SubmitJobs |++++++++++++++++++++++++++++++++++| 100%
> (00:00:00)
> > > > ## Waiting [S:0 R:0 D:10 E:0] |+++++++++++++++++++| 100%
> (00:00:00)
> > > > Error in LastError$store(results = results, is.error = !ok,
> > throw.error =
> > > > TRUE)
> > > > :
> > > > Errors occurred during execution. First error message:
> > > > Error in FUN(...): could not find function "fib"
> > > > [...]
> > > >
> > > >
> > > > # The following illustrates that the solution is not always
> > > > straightforward.
> > > > # (not specific to BiocParallel; must have been
> discussed previously)
> > > > values <- bplapply(0:9, FUN=function(n, fib) {
> > > > fib(n)
> > > > }, fib=fib)
> > > > Error in LastError$store(results = results, is.error = !ok,
> > > > throw.error = TRUE) :
> > > > Errors occurred during execution. First error message:
> > > > Error in fib(n): could not find function "fib"
> > > > [...]
> > > >
> > > > # Workaround; make fib() aware of itself
> > > > # (this is something the user need to do, and would be very
> > > > # hard for BiocParallel et al. to automate. BTW,
> should all
> > > > # recursive functions be implemented this way?).
> > > > fib <- function(n=0) {
> > > > if (n < 0) stop("Invalid 'n': ", n)
> > > > if (n == 0 || n == 1) return(1)
> > > > fib <- sys.function() # Make function aware of itself
> > > > fib(n-2) + fib(n-1)
> > > > }
> > > > values <- bplapply(0:9, FUN=function(n, fib) {
> > > > fib(n)
> > > > }, fib=fib)
> > > >
> > > >
> > > > WISHLIST:
> > > > Considering the above recursive issue solved, a slightly
> more explicit
> > > > and standardized solution is then:
> > > >
> > > > values <- bplapply(0:9, FUN=function(n, BPGLOBALS=NULL) {
> > > > for (name in names(BPGLOBALS)) assign(name,
> BPGLOBALS[[name]])
> > > > fib(n)
> > > > }, BPGLOBALS=list(fib=fib))
> > > >
> > > > Could the above be generalized into something as neat as:
> > > >
> > > > bpExport("fib")
> > > > values <- bplapply(0:9, FUN=function(n) {
> > > > BiocParallel::bpImport("fib")
> > > > fib(n)
> > > > })
> > > >
> > > > or ideally just (analogously to parallel::clusterExport()):
> > > >
> > > > bpExport("fib")
> > > > values <- bplapply(0:9, FUN=fib)
> > > >
> > > > /Henrik
> > > >
> > > > _______________________________________________
> > > > Bioc-devel@r-project.org
> <mailto:Bioc-devel@r-project.org> mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> > > >
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > _______________________________________________
> > > Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
> mailing list
> > > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> > >
> >
> >
> >
> > --
> > Gabriel Becker
> > Graduate Student
> > Statistics Department
> > University of California, Davis
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
> mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
> mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
>
>
> --
> Gabriel Becker
> Graduate Student
> Statistics Department
> University of California, Davis
[[alternative HTML version deleted]]
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel