One issue that I see is that for some kinds of parallel backends, there may not be any way for "bpworkers" to return something meaningful. For example, a backend that submits jobs to a large cluster may not know exactly how many nodes are in the cluster, and in any case returning the total number of nodes may not be appropriate, since those nodes are shared with other cluster users. This is primarily important for the pvec function, which uses the result of bpworkers to decide how many chunks to split the input into.

I guess one solution is to make sure that for any backend that cannot natively determine a number of available workers, we require the number of workers as an argument when creating the param object for that backend. e.g.:

param <- IndeterminateSizedClusterParam(workers=50).

Additionally, as discussed previously, it makes sense to be able to explicitly choose a chunk size or number of chunks for pvec, rather than splitting into exactly as many chunks as there are parallel workers. I implemented this in the non-generic multicore-only version of pvec, but I still need to port it to the generic version that works for any param. Do people think that the chunk options should be included in the MulticoreParam class, or specified when pvec is called?

I have also written a non-generic multicore-only version of pvectorize that allows for multiple vectorized arguments instead of just one, and furthermore gives the parallelized function an identical signature to the original function. Again, this needs to be ported to the generic bpvectorize.

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to