[Bioc-devel] BiocParallel -- update

Martin Morgan Mon, 03 Dec 2012 17:33:26 -0800

Bioc Developers --

BiocParallel generated quite a bit of discussion, so I'm providing a briefupdate. Version 0.0.5 is available to R-devel users via biocLite; it's in svn


  https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/BiocParallel

and github

  https://github.com/Bioconductor/BiocParallel

We have tried to incorporate some key ideas, though things are far from 
complete.


The basic idea is that one creates a 'param'

  p = MulticoreParam(workers=8)

and uses that in computations

  bplapply(1:8, function(i) Sys.sleep(1), param=p)

There is a simple registry, populated at start-up with a 'greedy' (e.g.,MulticoreParam(workers=parallel::detectCores()) param instance or invoked explicitly


  register(p)

the 'default' (most recently register'ed, with default=TRUE argument) is used ifparam is missing


  bplapply(1:8, function(i) Sys.sleep(1))

There are MulticoreParam, SnowParam, and DoparParam params so far; SnowParam is'lazy' and bpstart / bpstop can be used to start the implied cluster


> p = SnowParam(workers=2)
> p = bpstart(p)
Bioconductor version 2.12 (BiocInstaller 1.9.5), ?biocLite for help
Bioconductor version 2.12 (BiocInstaller 1.9.5), ?biocLite for help
> p = bpstop(p)

DoparParam (currently) indicates that a foreach-style back-end has beenregistered (via standard foreach approaches), and bplapply(1:8, ...,param=DoparParam()) uses foreach for evaluation. *Param are S4 classes (shouldprobably be reference classes) that extend BiocParallelParam and so anyone canimplement a new *Param; eventually BiocParallelParam will define 'required'fields (like 'workers' and 'setSeed') that all *Param objects are expected tosupport.

bplapply has signature bplapply(X, FUN, ..., param) and is a generic in allthree arguments, so again package developers can implement versions tailored totheir clusters (Florian has sent me some code for an SGE scheduler, which I havenot yet incorporated).

Only bplapply and bpvec are currently implemented as 'algorithms'. They have acommon signature and have been implemented to rely only on length, '[', '[['(for bplapply) and 'c' (for bpvec); this is the 'contract' that we'll try tomaintain. We'd like to implement other algorithms, and to make currentalgorithms more useful by including better error handling, scheduling, andreduction.

bpvectorize is a simple way to convert 'vectorized' functions into a parallel,vectorized version, e.g., pcountOverlaps = bpvectorize(countOverlaps).

I'm happy to hear of major mis-steps, and areas in pressing need of development,either on or off list or via the github interface.

Ryan Thompson has made valuable contributions, especially DoparParam andcleaning up bpvec and bplapply; I haven't always managed to wrangle git and svn(thanks Laurent for the --add-author-name tip, which works when I do otherthings right) in a way that fully credits his contribution, for which I apologize.


Martin
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] BiocParallel -- update

Reply via email to