The difference is that in the parallel package, you use mclapply for
multicore and parLapply for multi-machine parallelism. If you want to
switch from one to the other, you have to change all your code that
uses either function to the other one. If you use llply(...,
.parallel=TRUE), then all you have to do is register a different
backend, which is one line of code to load the new backend and a second
one to register it, and the rest of your code stays the same.
On Fri 16 Nov 2012 03:24:56 PM PST, Michael Lawrence wrote:
On Fri, Nov 16, 2012 at 11:44 AM, Ryan C. Thompson
<r...@thompsonclan.org <mailto:r...@thompsonclan.org>> wrote:
You don't have to use foreach directly. I use foreach almost
exclusively through the plyr package, which uses foreach
internally to implement parallelism. Like you, I'm not
particularly fond of the foreach syntax (though it has some nice
features that come in handy sometimes).
The appeal of foreach is that it supports pluggable parallelizing
backends, so you can (in theory) write the same code and
parallelize it across multiple cores, or across an entire cluster,
just by plugging in different backends.
But isn't this also possible with the parallel package? It was
inherited from snow. I'd be more in favor of extending the parallel
package, simply because it's part of base R.
On Fri 16 Nov 2012 10:17:24 AM PST, Michael Lawrence wrote:
I'm not sure I understand the appeal of foreach. Why not do this
within the functional paradigm, i.e, parLapply?
Michael
On Fri, Nov 16, 2012 at 9:41 AM, Ryan C. Thompson
<r...@thompsonclan.org <mailto:r...@thompsonclan.org>
<mailto:r...@thompsonclan.org <mailto:r...@thompsonclan.org>>>
wrote:
You could write a %dopar% backend for the foreach package,
which
would allow any code using foreach (or plyr which uses
foreach) to
parallelize using your code.
On a related note, it might be nice to add
Bioconductor-compatible
versions of foreach and the plyr functions to BiocParallel if
they're not already compatible.
On 11/16/2012 12:18 AM, Hahne, Florian wrote:
I've hacked up some code that uses BatchJobs but makes
it look
like a
normal parLapply operation. Currently the main R
process is
checking the
state of the queue in regular intervals and fetches
results
once a job has
finished. Seems to work quite nicely, although there
certainly
are more
elaborate ways to deal with the synchronous/asynchronous
issue. Is that
something that could be interesting for the broader
audience?
I could add
the code to BiocParallel for folks to try it out.
The whole thing may be a dumb idea, but I find it kind of
useful to be
able to start parallel jobs directly from R on our
huge SGE
cluster, have
the calling script wait for all jobs to finish and then
continue with some
downstream computations, rather than having to
manually check
the job
status and start another script once the results are
there.
Florian
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel