Re: [Bioc-devel] BiocParallel

Vincent Carey Tue, 04 Dec 2012 03:48:09 -0800

I have been booked up so no chance to deploy but I do have access to SGE
and LSF so will try and will report ASAP.


On Tue, Dec 4, 2012 at 4:08 AM, Hahne, Florian
<florian.ha...@novartis.com>wrote:

> Hi Henrik,
> I have now come up now with a relatively generic version of this
> SGEcluster approach. It does indeed use BatchJobs under the hood and
> should thus support all available cluster queues, assuming that the
> necessary batchJobs routines are available. I could only test this on our
> SGE cluster, but Vince wanted to try other queuing systems. Not sure how
> far he got. For now the code is wrapped in a little package called
> Qcluster with some documentation. If you want to I can send you a version
> in a separate mail. Would be good to test this on other systems, and I am
> sure there remain some bugs that need to be ironed out. In particular the
> fault tolerance you mentioned needs to be addressed properly. Currently
> the code may leave unwanted garbage if things fail in the wrong places
> because all the communication is file-based.
> Martin, I'll send you my updated version in case you want to include this
> in biocParallel for others to contribute.
> Florian
> --
>
>
>
>
>
>
> On 12/4/12 5:46 AM, "Henrik Bengtsson" <h...@biostat.ucsf.edu> wrote:
>
> >Picking up this thread in lack of other places (= were should
> >BiocParallel be discussed?)
> >
> >I saw Martin's updates on the BiocParallel - great.  Florian's SGE
> >scheduler was also mentioned; is that one built on top of BatchJobs?
> >If so I'd be interested in looking into that/generalizing that to work
> >with any BatchJobs scheduler.
> >
> >I believe there is going to be a new release of BatchJobs rather soon,
> >so it's probably worth waiting until that is available.
> >
> >The main use case I'm interested in is to launch batch jobs on a
> >PBS/Torque cluster, and then use multicore processing on each compute
> >node.  It would be nice to be able to do this using the BiocParallel
> >model, but maybe it is too optimistic to get everything to work under
> >same model.  Also, as Vince hinted, fault tolerance etc needs to be
> >addressed and needs to be addressed differently in the different
> >setups.
> >
> >/Henrik
> >
> >On Tue, Nov 20, 2012 at 6:59 AM, Ramon Diaz-Uriarte <rdia...@gmail.com>
> >wrote:
> >>
> >>
> >>
> >> On Sat, 17 Nov 2012 13:05:29 -0800,"Ryan C. Thompson"
> >><r...@thompsonclan.org> wrote:
> >>
> >>> On 11/17/2012 02:39 AM, Ramon Diaz-Uriarte wrote:
> >>> > In addition to Steve's comment, is it really a good thing that "all
> >>>code
> >>> > stays the same."?  I mean, multiple machines vs. multiple cores are,
> >>> > often, _very_ different things: for instance, shared vs. distributed
> >>> > memory, communication overhead differences, whether or not you can
> >>>assume
> >>> > packages and objects to be automagically present in the slaves/child
> >>> > process, etc. So, given they are different situations, I think it
> >>> > sometimes makes sense to want to write different code for each
> >>>situation
> >>> > (I often do); not to mention Steve's hybrid cases ;-).
> >>> >
> >>> >
> >>> > Since BiocParallel seems to be a major undertaking, maybe it would be
> >>> > appropriate to provide a flexible approach, instead of hard wiring
> >>>the
> >>> > foreach approach.
> >>> Of course there are cases where the same code simply can't work for
> >>>both
> >>> multicore and multi-machine situations, but those generally don't fall
> >>> into the category of things that can be done using lapply. Lapply and
> >>> all of its parallelized buddies like mclapply, parLapply, and foreach
> >>> are designed for data-parallel operations with no interdependence
> >>> between results, and these kinds of operations generally parallelize as
> >>> well across machines as across cores, unless your network is not fast
> >>> enough (in which case you would choose not to use multi-machine
> >>> parallelism). If you want a parallel algorithm for something like the
> >>> disjoin method of GRanges, you might need to write some special purpose
> >>> code, and that code might be very different for multicore vs
> >>>multi-machine.
> >>
> >>> So yes, sometimes there is a fundamental reason that you have to change
> >>> the code to make it run on multiple machines, and neither foreach nor
> >>> any other parallelization framework will save you from having to
> >>>rewrite
> >>> your code. But often there is no fundamental reason that the code has
> >>>to
> >>> change, but you end up changing it anyway because of limitations in
> >>>your
> >>> parallelization framework. This is the case that foreach saves you
> >>>from.
> >>
> >>
> >>
> >> Hummm... I guess you are right, and we are talking about "often" or
> >>"most
> >> of the time", which is where all this would fit. Point taken.
> >>
> >>
> >> Best,
> >>
> >> R.
> >>
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Ramon Diaz-Uriarte
> >> Department of Biochemistry, Lab B-25
> >> Facultad de Medicina
> >> Universidad Autónoma de Madrid
> >> Arzobispo Morcillo, 4
> >> 28029 Madrid
> >> Spain
> >>
> >> Phone: +34-91-497-2412
> >>
> >> Email: rdia...@gmail.com
> >>        ramon.d...@iib.uam.es
> >>
> >> http://ligarto.org/rdiaz
> >>
> >> _______________________________________________
> >> Bioc-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] BiocParallel

Reply via email to