Re: [Bioc-devel] BiocParallel -- update

Ryan C. Thompson Tue, 04 Dec 2012 12:48:31 -0800

One issue that I see is that for some kinds of parallel backends, theremay not be any way for "bpworkers" to return something meaningful. Forexample, a backend that submits jobs to a large cluster may not knowexactly how many nodes are in the cluster, and in any case returning thetotal number of nodes may not be appropriate, since those nodes areshared with other cluster users. This is primarily important for thepvec function, which uses the result of bpworkers to decide how manychunks to split the input into.

I guess one solution is to make sure that for any backend that cannotnatively determine a number of available workers, we require the numberof workers as an argument when creating the param object for thatbackend. e.g.:


param <- IndeterminateSizedClusterParam(workers=50).

Additionally, as discussed previously, it makes sense to be able toexplicitly choose a chunk size or number of chunks for pvec, rather thansplitting into exactly as many chunks as there are parallel workers. Iimplemented this in the non-generic multicore-only version of pvec, butI still need to port it to the generic version that works for any param.Do people think that the chunk options should be included in theMulticoreParam class, or specified when pvec is called?

I have also written a non-generic multicore-only version of pvectorizethat allows for multiple vectorized arguments instead of just one, andfurthermore gives the parallelized function an identical signature tothe original function. Again, this needs to be ported to the genericbpvectorize.


_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] BiocParallel -- update

Reply via email to