Personally, having used memcached in the past for distributed shared memory caching, I am most interested in 3) and doRedis. Many cluster/batch processing systems are a colossal PITA, and a worker queue would go a long way towards fixing that. Less checkpointing, more results... I hope.
As an aside, if I want to do the following: ol <- findOverlaps(object, x) so <- object[queryHits(ol)] sx <- x[subjectHits(ol)] disjoint <- subsetByOverlaps(disjoin(c(sx, so, ignore.mcols = T)), so) mcols(disjoint)[, "state"] <- rep("", length(disjoint)) byState <- split(so, mcols(so)[, "state"]) for (state in names(byState)) { overlapping <- queryHits(findOverlaps(disjoint, byState[[state]])) if (length(overlapping) > 0) mcols(disjoint[overlapping])[, "state"] <- state } mcols(disjoint)[, "state"] <- as.factor(mcols(disjoint)[, "state"]) often and fast, where 'object' and 'x' are ranges with large numbers of intervals, is there a clever way to speed it up a lot? On Thu, Nov 15, 2012 at 10:53 AM, Henrik Bengtsson <h...@biostat.ucsf.edu>wrote: > Is there any write up/discussion/plans on the various types of > parallel computations out there: > > (1) one machine / multi-core/multi-threaded > (2) multiple machines / multiple processes > (3) batch / queue processing (on large compute clusters with many users). > (4) ... > > Are we/you mainly focusing on (1) and (2)? > > /Henrik > > On Thu, Nov 15, 2012 at 6:21 AM, Kasper Daniel Hansen > <kasperdanielhan...@gmail.com> wrote: > > I'll second Ryan's patch (at least in principle). When I parallelize > > across multiple cores, I have always found mc.preschedule to be an > > important option to expose (that, and the number of cores, is all I > > use routinely). > > > > Kasper > > > > On Wed, Nov 14, 2012 at 7:14 PM, Ryan C. Thompson <r...@thompsonclan.org> > wrote: > >> I just submitted a pull request. I'll add tests shortly if I can figure > out > >> how to write them. > >> > >> > >> On Wed 14 Nov 2012 03:50:36 PM PST, Martin Morgan wrote: > >>> > >>> On 11/14/2012 03:43 PM, Ryan C. Thompson wrote: > >>>> > >>>> Here are two alternative implementations of pvec. pvec2 is just a > >>>> simple rewrite > >>>> of pvec to use mclapply. pvec3 then extends pvec2 to accept a > >>>> specified chunk > >>>> size or a specified number of chunks. If the number of chunks exceeds > >>>> the number > >>>> of cores, then multiple chunks will get run sequentially on each > >>>> core. pvec3 > >>>> also exposes the "mc.prescheule" argument of mclapply, since that is > >>>> relevant > >>>> when there are more chunks than cores. Lastly, I provide a > >>>> "pvectorize" function > >>>> which can be called on a regular vectorized function to make it into > >>>> a pvec'd > >>>> version of itself. Usage is like: sqrt.parallel <- pvectorize(sqrt); > >>>> sqrt.parallel(1:1000). > >>>> > >>>> pvec2 <- function(v, FUN, ..., mc.set.seed = TRUE, mc.silent = FALSE, > >>>> mc.cores = getOption("mc.cores", 2L), mc.cleanup = > >>>> TRUE) > >>>> { > >>>> env <- parent.frame() > >>>> cores <- as.integer(mc.cores) > >>>> if(cores < 1L) stop("'mc.cores' must be >= 1") > >>>> if(cores == 1L) return(FUN(v, ...)) > >>>> > >>>> if(mc.set.seed) mc.reset.stream() > >>>> > >>>> n <- length(v) > >>>> si <- splitIndices(n, cores) > >>>> res <- do.call(c, > >>>> mclapply(si, function(i) FUN(v[i], ...), > >>>> mc.set.seed=mc.set.seed, > >>>> mc.silent=mc.silent, > >>>> mc.cores=mc.cores, > >>>> mc.cleanup=mc.cleanup)) > >>>> if (length(res) != n) > >>>> warning("some results may be missing, folded or caused an error") > >>>> res > >>>> } > >>>> pvec3 <- function(v, FUN, ..., mc.set.seed = TRUE, mc.silent = FALSE, > >>>> mc.cores = getOption("mc.cores", 2L), mc.cleanup = > >>>> TRUE, > >>>> mc.preschedule=FALSE, num.chunks, chunk.size) > >>>> { > >>>> env <- parent.frame() > >>>> cores <- as.integer(mc.cores) > >>>> if(cores < 1L) stop("'mc.cores' must be >= 1") > >>>> if(cores == 1L) return(FUN(v, ...)) > >>>> > >>>> if(mc.set.seed) mc.reset.stream() > >>>> > >>>> n <- length(v) > >>>> if (missing(num.chunks)) { > >>>> if (missing(chunk.size)) { > >>>> num.chunks <- cores > >>>> } else { > >>>> num.chunks <- ceiling(n/chunk.size) > >>>> } > >>>> } > >>>> si <- splitIndices(n, num.chunks) > >>>> res <- do.call(c, > >>>> mclapply(si, function(i) FUN(v[i], ...), > >>>> mc.set.seed=mc.set.seed, > >>>> mc.silent=mc.silent, > >>>> mc.cores=mc.cores, > >>>> mc.cleanup=mc.cleanup, > >>>> mc.preschedule=mc.preschedule)) > >>>> if (length(res) != n) > >>>> warning("some results may be missing, folded or caused an error") > >>>> res > >>>> } > >>>> > >>>> pvectorize <- function(FUN) { > >>>> function(...) pvec3(FUN=FUN, ...) > >>>> } > >>> > >>> > >>> would be great to have these as 'pull' requests in github; pvec3 as a > >>> replacement for pvec, if it's implementing the same concept but better. > >>> > >>> Unit tests would be good (yes being a little hypocritical). > >>> inst/unitTests, using RUnit, examples in > >>> > >>> > >>> > https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/IRanges/inst/unitTests > >>> > >>> > >>> with username / password readonly > >>> > >>> Martin > >>> > >>>> On Wed 14 Nov 2012 02:23:30 PM PST, Michael Lawrence wrote: > >>>>> > >>>>> > >>>>> On Wed, Nov 14, 2012 at 12:23 PM, Martin Morgan <mtmor...@fhcrc.org> > >>>>> wrote: > >>>>> > >>>>>> > >>>>>> Interested developers -- I added the start of a BiocParallel > >>>>>> package to > >>>>>> the Bioconductor subversion repository and build system. > >>>>>> > >>>>>> The package is mirrored on github to allow for social coding; I > >>>>>> encourage > >>>>>> people to contribute via that mechanism. > >>>>>> > >>>>>> > >>>>>> https://github.com/**Bioconductor/BiocParallel< > https://github.com/Bioconductor/BiocParallel> > >>>>>> > >>>>>> > >>>>>> > >>>>>> The purpose is to help focus our efforts at developing appropriate > >>>>>> parallel paradigms. Currently the package Imports: parallel and > >>>>>> implements > >>>>>> pvec and mclapply in a way that allows for operation on any vector > >>>>>> or list > >>>>>> supporting length(), [, and [[ (the latter for mclapply). pvec in > >>>>>> particular seems to be appropriate for GRanges-like objects, where > >>>>>> we don't > >>>>>> necessarily want to extract many thousands of S4 instances of > >>>>>> individual > >>>>>> ranges with [[. > >>>>>> > >>>>> > >>>>> > >>>>> Makes sense. Besides, [[ does not even work on GRanges. One > >>>>> limitation of > >>>>> pvec is that it does not support a chunk size; it just uses > length(x) / > >>>>> ncores. It would be nice to be able to restrict that, which would > then > >>>>> require multiple jobs per core. Unless I'm missing something. > >>>>> > >>>>> > >>>>>> > >>>>>> > >>>>>> Hopefully the ideas in the package can be folded back in to > >>>>>> parallel as > >>>>>> they mature. > >>>>>> > >>>>>> Martin > >>>>>> -- > >>>>>> Dr. Martin Morgan, PhD > >>>>>> Fred Hutchinson Cancer Research Center > >>>>>> 1100 Fairview Ave. N. > >>>>>> PO Box 19024 Seattle, WA 98109 > >>>>>> > >>>>>> ______________________________**_________________ > >>>>>> Bioc-devel@r-project.org mailing list > >>>>>> > >>>>>> https://stat.ethz.ch/mailman/**listinfo/bioc-devel< > https://stat.ethz.ch/mailman/listinfo/bioc-devel> > >>>>>> > >>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> [[alternative HTML version deleted]] > >>>>> > >>>>> _______________________________________________ > >>>>> Bioc-devel@r-project.org mailing list > >>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel > >>> > >>> > >>> > >> > >> _______________________________________________ > >> Bioc-devel@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/bioc-devel > > > > _______________________________________________ > > Bioc-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf> [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel