I'll second Ryan's patch (at least in principle). When I parallelize across multiple cores, I have always found mc.preschedule to be an important option to expose (that, and the number of cores, is all I use routinely).
Kasper On Wed, Nov 14, 2012 at 7:14 PM, Ryan C. Thompson <r...@thompsonclan.org> wrote: > I just submitted a pull request. I'll add tests shortly if I can figure out > how to write them. > > > On Wed 14 Nov 2012 03:50:36 PM PST, Martin Morgan wrote: >> >> On 11/14/2012 03:43 PM, Ryan C. Thompson wrote: >>> >>> Here are two alternative implementations of pvec. pvec2 is just a >>> simple rewrite >>> of pvec to use mclapply. pvec3 then extends pvec2 to accept a >>> specified chunk >>> size or a specified number of chunks. If the number of chunks exceeds >>> the number >>> of cores, then multiple chunks will get run sequentially on each >>> core. pvec3 >>> also exposes the "mc.prescheule" argument of mclapply, since that is >>> relevant >>> when there are more chunks than cores. Lastly, I provide a >>> "pvectorize" function >>> which can be called on a regular vectorized function to make it into >>> a pvec'd >>> version of itself. Usage is like: sqrt.parallel <- pvectorize(sqrt); >>> sqrt.parallel(1:1000). >>> >>> pvec2 <- function(v, FUN, ..., mc.set.seed = TRUE, mc.silent = FALSE, >>> mc.cores = getOption("mc.cores", 2L), mc.cleanup = >>> TRUE) >>> { >>> env <- parent.frame() >>> cores <- as.integer(mc.cores) >>> if(cores < 1L) stop("'mc.cores' must be >= 1") >>> if(cores == 1L) return(FUN(v, ...)) >>> >>> if(mc.set.seed) mc.reset.stream() >>> >>> n <- length(v) >>> si <- splitIndices(n, cores) >>> res <- do.call(c, >>> mclapply(si, function(i) FUN(v[i], ...), >>> mc.set.seed=mc.set.seed, >>> mc.silent=mc.silent, >>> mc.cores=mc.cores, >>> mc.cleanup=mc.cleanup)) >>> if (length(res) != n) >>> warning("some results may be missing, folded or caused an error") >>> res >>> } >>> pvec3 <- function(v, FUN, ..., mc.set.seed = TRUE, mc.silent = FALSE, >>> mc.cores = getOption("mc.cores", 2L), mc.cleanup = >>> TRUE, >>> mc.preschedule=FALSE, num.chunks, chunk.size) >>> { >>> env <- parent.frame() >>> cores <- as.integer(mc.cores) >>> if(cores < 1L) stop("'mc.cores' must be >= 1") >>> if(cores == 1L) return(FUN(v, ...)) >>> >>> if(mc.set.seed) mc.reset.stream() >>> >>> n <- length(v) >>> if (missing(num.chunks)) { >>> if (missing(chunk.size)) { >>> num.chunks <- cores >>> } else { >>> num.chunks <- ceiling(n/chunk.size) >>> } >>> } >>> si <- splitIndices(n, num.chunks) >>> res <- do.call(c, >>> mclapply(si, function(i) FUN(v[i], ...), >>> mc.set.seed=mc.set.seed, >>> mc.silent=mc.silent, >>> mc.cores=mc.cores, >>> mc.cleanup=mc.cleanup, >>> mc.preschedule=mc.preschedule)) >>> if (length(res) != n) >>> warning("some results may be missing, folded or caused an error") >>> res >>> } >>> >>> pvectorize <- function(FUN) { >>> function(...) pvec3(FUN=FUN, ...) >>> } >> >> >> would be great to have these as 'pull' requests in github; pvec3 as a >> replacement for pvec, if it's implementing the same concept but better. >> >> Unit tests would be good (yes being a little hypocritical). >> inst/unitTests, using RUnit, examples in >> >> >> https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/IRanges/inst/unitTests >> >> >> with username / password readonly >> >> Martin >> >>> On Wed 14 Nov 2012 02:23:30 PM PST, Michael Lawrence wrote: >>>> >>>> >>>> On Wed, Nov 14, 2012 at 12:23 PM, Martin Morgan <mtmor...@fhcrc.org> >>>> wrote: >>>> >>>>> >>>>> Interested developers -- I added the start of a BiocParallel >>>>> package to >>>>> the Bioconductor subversion repository and build system. >>>>> >>>>> The package is mirrored on github to allow for social coding; I >>>>> encourage >>>>> people to contribute via that mechanism. >>>>> >>>>> >>>>> https://github.com/**Bioconductor/BiocParallel<https://github.com/Bioconductor/BiocParallel> >>>>> >>>>> >>>>> >>>>> The purpose is to help focus our efforts at developing appropriate >>>>> parallel paradigms. Currently the package Imports: parallel and >>>>> implements >>>>> pvec and mclapply in a way that allows for operation on any vector >>>>> or list >>>>> supporting length(), [, and [[ (the latter for mclapply). pvec in >>>>> particular seems to be appropriate for GRanges-like objects, where >>>>> we don't >>>>> necessarily want to extract many thousands of S4 instances of >>>>> individual >>>>> ranges with [[. >>>>> >>>> >>>> >>>> Makes sense. Besides, [[ does not even work on GRanges. One >>>> limitation of >>>> pvec is that it does not support a chunk size; it just uses length(x) / >>>> ncores. It would be nice to be able to restrict that, which would then >>>> require multiple jobs per core. Unless I'm missing something. >>>> >>>> >>>>> >>>>> >>>>> Hopefully the ideas in the package can be folded back in to >>>>> parallel as >>>>> they mature. >>>>> >>>>> Martin >>>>> -- >>>>> Dr. Martin Morgan, PhD >>>>> Fred Hutchinson Cancer Research Center >>>>> 1100 Fairview Ave. N. >>>>> PO Box 19024 Seattle, WA 98109 >>>>> >>>>> ______________________________**_________________ >>>>> Bioc-devel@r-project.org mailing list >>>>> >>>>> https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel> >>>>> >>>>> >>>>> >>>> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioc-devel@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> >> >> > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel