On Thu, Jun 6, 2013 at 1:56 PM, Henrik Bengtsson <h...@biostat.ucsf.edu> wrote: > Hi, I'd like to pick up the discussion on a BatchJobs backend for > BiocParallel where it was left back in Dec 2012 (Bioc-devel thread > 'BiocParallel' > [https://stat.ethz.ch/pipermail/bioc-devel/2012-December/003918.html]). > > Florian, would you mind sharing your BatchJobs backend code? Is it > independent of BiocParallel and/or have you tried it with the most > recent BiocParallel implementation > [https://github.com/Bioconductor/BiocParallel/]? >
You should be aware that there is Google Summer of Code project in progress to address this. http://www.bioconductor.org/developers/gsoc2013/ (towards the bottom) Dan > /Henrik > > On Tue, Dec 4, 2012 at 12:38 PM, Henrik Bengtsson <h...@biostat.ucsf.edu> > wrote: >> Thanks. >> >> On Tue, Dec 4, 2012 at 3:47 AM, Vincent Carey >> <st...@channing.harvard.edu> wrote: >>> I have been booked up so no chance to deploy but I do have access to SGE and >>> LSF so will try and will report ASAP. >> >> ...and I'll try it out on PBS (... but I most likely won't have time >> to do this until the end of the year). >> >> Henrik >> >>> >>> >>> On Tue, Dec 4, 2012 at 4:08 AM, Hahne, Florian <florian.ha...@novartis.com> >>> wrote: >>>> >>>> Hi Henrik, >>>> I have now come up now with a relatively generic version of this >>>> SGEcluster approach. It does indeed use BatchJobs under the hood and >>>> should thus support all available cluster queues, assuming that the >>>> necessary batchJobs routines are available. I could only test this on our >>>> SGE cluster, but Vince wanted to try other queuing systems. Not sure how >>>> far he got. For now the code is wrapped in a little package called >>>> Qcluster with some documentation. If you want to I can send you a version >>>> in a separate mail. Would be good to test this on other systems, and I am >>>> sure there remain some bugs that need to be ironed out. In particular the >>>> fault tolerance you mentioned needs to be addressed properly. Currently >>>> the code may leave unwanted garbage if things fail in the wrong places >>>> because all the communication is file-based. >>>> Martin, I'll send you my updated version in case you want to include this >>>> in biocParallel for others to contribute. >>>> Florian >>>> -- >>>> >>>> >>>> >>>> >>>> >>>> >>>> On 12/4/12 5:46 AM, "Henrik Bengtsson" <h...@biostat.ucsf.edu> wrote: >>>> >>>> >Picking up this thread in lack of other places (= were should >>>> >BiocParallel be discussed?) >>>> > >>>> >I saw Martin's updates on the BiocParallel - great. Florian's SGE >>>> >scheduler was also mentioned; is that one built on top of BatchJobs? >>>> >If so I'd be interested in looking into that/generalizing that to work >>>> >with any BatchJobs scheduler. >>>> > >>>> >I believe there is going to be a new release of BatchJobs rather soon, >>>> >so it's probably worth waiting until that is available. >>>> > >>>> >The main use case I'm interested in is to launch batch jobs on a >>>> >PBS/Torque cluster, and then use multicore processing on each compute >>>> >node. It would be nice to be able to do this using the BiocParallel >>>> >model, but maybe it is too optimistic to get everything to work under >>>> >same model. Also, as Vince hinted, fault tolerance etc needs to be >>>> >addressed and needs to be addressed differently in the different >>>> >setups. >>>> > >>>> >/Henrik >>>> > >>>> >On Tue, Nov 20, 2012 at 6:59 AM, Ramon Diaz-Uriarte <rdia...@gmail.com> >>>> >wrote: >>>> >> >>>> >> >>>> >> >>>> >> On Sat, 17 Nov 2012 13:05:29 -0800,"Ryan C. Thompson" >>>> >><r...@thompsonclan.org> wrote: >>>> >> >>>> >>> On 11/17/2012 02:39 AM, Ramon Diaz-Uriarte wrote: >>>> >>> > In addition to Steve's comment, is it really a good thing that "all >>>> >>>code >>>> >>> > stays the same."? I mean, multiple machines vs. multiple cores are, >>>> >>> > often, _very_ different things: for instance, shared vs. distributed >>>> >>> > memory, communication overhead differences, whether or not you can >>>> >>>assume >>>> >>> > packages and objects to be automagically present in the slaves/child >>>> >>> > process, etc. So, given they are different situations, I think it >>>> >>> > sometimes makes sense to want to write different code for each >>>> >>>situation >>>> >>> > (I often do); not to mention Steve's hybrid cases ;-). >>>> >>> > >>>> >>> > >>>> >>> > Since BiocParallel seems to be a major undertaking, maybe it would >>>> >>> > be >>>> >>> > appropriate to provide a flexible approach, instead of hard wiring >>>> >>>the >>>> >>> > foreach approach. >>>> >>> Of course there are cases where the same code simply can't work for >>>> >>>both >>>> >>> multicore and multi-machine situations, but those generally don't fall >>>> >>> into the category of things that can be done using lapply. Lapply and >>>> >>> all of its parallelized buddies like mclapply, parLapply, and foreach >>>> >>> are designed for data-parallel operations with no interdependence >>>> >>> between results, and these kinds of operations generally parallelize >>>> >>> as >>>> >>> well across machines as across cores, unless your network is not fast >>>> >>> enough (in which case you would choose not to use multi-machine >>>> >>> parallelism). If you want a parallel algorithm for something like the >>>> >>> disjoin method of GRanges, you might need to write some special >>>> >>> purpose >>>> >>> code, and that code might be very different for multicore vs >>>> >>>multi-machine. >>>> >> >>>> >>> So yes, sometimes there is a fundamental reason that you have to >>>> >>> change >>>> >>> the code to make it run on multiple machines, and neither foreach nor >>>> >>> any other parallelization framework will save you from having to >>>> >>>rewrite >>>> >>> your code. But often there is no fundamental reason that the code has >>>> >>>to >>>> >>> change, but you end up changing it anyway because of limitations in >>>> >>>your >>>> >>> parallelization framework. This is the case that foreach saves you >>>> >>>from. >>>> >> >>>> >> >>>> >> >>>> >> Hummm... I guess you are right, and we are talking about "often" or >>>> >>"most >>>> >> of the time", which is where all this would fit. Point taken. >>>> >> >>>> >> >>>> >> Best, >>>> >> >>>> >> R. >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> -- >>>> >> Ramon Diaz-Uriarte >>>> >> Department of Biochemistry, Lab B-25 >>>> >> Facultad de Medicina >>>> >> Universidad Autónoma de Madrid >>>> >> Arzobispo Morcillo, 4 >>>> >> 28029 Madrid >>>> >> Spain >>>> >> >>>> >> Phone: +34-91-497-2412 >>>> >> >>>> >> Email: rdia...@gmail.com >>>> >> ramon.d...@iib.uam.es >>>> >> >>>> >> http://ligarto.org/rdiaz >>>> >> >>>> >> _______________________________________________ >>>> >> Bioc-devel@r-project.org mailing list >>>> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>> >>> > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel