Bioc Developers --

BiocParallel generated quite a bit of discussion, so I'm providing a brief update. Version 0.0.5 is available to R-devel users via biocLite; it's in svn

  https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/BiocParallel

and github

  https://github.com/Bioconductor/BiocParallel

We have tried to incorporate some key ideas, though things are far from 
complete.


The basic idea is that one creates a 'param'

  p = MulticoreParam(workers=8)

and uses that in computations

  bplapply(1:8, function(i) Sys.sleep(1), param=p)


There is a simple registry, populated at start-up with a 'greedy' (e.g., MulticoreParam(workers=parallel::detectCores()) param instance or invoked explicitly

  register(p)

the 'default' (most recently register'ed, with default=TRUE argument) is used if param is missing

  bplapply(1:8, function(i) Sys.sleep(1))


There are MulticoreParam, SnowParam, and DoparParam params so far; SnowParam is 'lazy' and bpstart / bpstop can be used to start the implied cluster

> p = SnowParam(workers=2)
> p = bpstart(p)
Bioconductor version 2.12 (BiocInstaller 1.9.5), ?biocLite for help
Bioconductor version 2.12 (BiocInstaller 1.9.5), ?biocLite for help
> p = bpstop(p)

DoparParam (currently) indicates that a foreach-style back-end has been registered (via standard foreach approaches), and bplapply(1:8, ..., param=DoparParam()) uses foreach for evaluation. *Param are S4 classes (should probably be reference classes) that extend BiocParallelParam and so anyone can implement a new *Param; eventually BiocParallelParam will define 'required' fields (like 'workers' and 'setSeed') that all *Param objects are expected to support.


bplapply has signature bplapply(X, FUN, ..., param) and is a generic in all three arguments, so again package developers can implement versions tailored to their clusters (Florian has sent me some code for an SGE scheduler, which I have not yet incorporated).


Only bplapply and bpvec are currently implemented as 'algorithms'. They have a common signature and have been implemented to rely only on length, '[', '[[' (for bplapply) and 'c' (for bpvec); this is the 'contract' that we'll try to maintain. We'd like to implement other algorithms, and to make current algorithms more useful by including better error handling, scheduling, and reduction.


bpvectorize is a simple way to convert 'vectorized' functions into a parallel, vectorized version, e.g., pcountOverlaps = bpvectorize(countOverlaps).


I'm happy to hear of major mis-steps, and areas in pressing need of development, either on or off list or via the github interface.


Ryan Thompson has made valuable contributions, especially DoparParam and cleaning up bpvec and bplapply; I haven't always managed to wrangle git and svn (thanks Laurent for the --add-author-name tip, which works when I do other things right) in a way that fully credits his contribution, for which I apologize.

Martin
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to