And here is the on-going development of the backend: https://github.com/mllg/BiocParallel/tree/batchjobs
Not sure how well it's been tested. Kudos to Michel Lang for making so much progress so quickly. Michael On Thu, Jun 6, 2013 at 1:59 PM, Dan Tenenbaum <dtene...@fhcrc.org> wrote: > On Thu, Jun 6, 2013 at 1:56 PM, Henrik Bengtsson <h...@biostat.ucsf.edu> > wrote: > > Hi, I'd like to pick up the discussion on a BatchJobs backend for > > BiocParallel where it was left back in Dec 2012 (Bioc-devel thread > > 'BiocParallel' [ > https://stat.ethz.ch/pipermail/bioc-devel/2012-December/003918.html]). > > > > Florian, would you mind sharing your BatchJobs backend code? Is it > > independent of BiocParallel and/or have you tried it with the most > > recent BiocParallel implementation > > [https://github.com/Bioconductor/BiocParallel/]? > > > > You should be aware that there is Google Summer of Code project in > progress to address this. > > http://www.bioconductor.org/developers/gsoc2013/ (towards the bottom) > > Dan > > > > /Henrik > > > > On Tue, Dec 4, 2012 at 12:38 PM, Henrik Bengtsson <h...@biostat.ucsf.edu> > wrote: > >> Thanks. > >> > >> On Tue, Dec 4, 2012 at 3:47 AM, Vincent Carey > >> <st...@channing.harvard.edu> wrote: > >>> I have been booked up so no chance to deploy but I do have access to > SGE and > >>> LSF so will try and will report ASAP. > >> > >> ...and I'll try it out on PBS (... but I most likely won't have time > >> to do this until the end of the year). > >> > >> Henrik > >> > >>> > >>> > >>> On Tue, Dec 4, 2012 at 4:08 AM, Hahne, Florian < > florian.ha...@novartis.com> > >>> wrote: > >>>> > >>>> Hi Henrik, > >>>> I have now come up now with a relatively generic version of this > >>>> SGEcluster approach. It does indeed use BatchJobs under the hood and > >>>> should thus support all available cluster queues, assuming that the > >>>> necessary batchJobs routines are available. I could only test this on > our > >>>> SGE cluster, but Vince wanted to try other queuing systems. Not sure > how > >>>> far he got. For now the code is wrapped in a little package called > >>>> Qcluster with some documentation. If you want to I can send you a > version > >>>> in a separate mail. Would be good to test this on other systems, and > I am > >>>> sure there remain some bugs that need to be ironed out. In particular > the > >>>> fault tolerance you mentioned needs to be addressed properly. > Currently > >>>> the code may leave unwanted garbage if things fail in the wrong places > >>>> because all the communication is file-based. > >>>> Martin, I'll send you my updated version in case you want to include > this > >>>> in biocParallel for others to contribute. > >>>> Florian > >>>> -- > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> On 12/4/12 5:46 AM, "Henrik Bengtsson" <h...@biostat.ucsf.edu> wrote: > >>>> > >>>> >Picking up this thread in lack of other places (= were should > >>>> >BiocParallel be discussed?) > >>>> > > >>>> >I saw Martin's updates on the BiocParallel - great. Florian's SGE > >>>> >scheduler was also mentioned; is that one built on top of BatchJobs? > >>>> >If so I'd be interested in looking into that/generalizing that to > work > >>>> >with any BatchJobs scheduler. > >>>> > > >>>> >I believe there is going to be a new release of BatchJobs rather > soon, > >>>> >so it's probably worth waiting until that is available. > >>>> > > >>>> >The main use case I'm interested in is to launch batch jobs on a > >>>> >PBS/Torque cluster, and then use multicore processing on each compute > >>>> >node. It would be nice to be able to do this using the BiocParallel > >>>> >model, but maybe it is too optimistic to get everything to work under > >>>> >same model. Also, as Vince hinted, fault tolerance etc needs to be > >>>> >addressed and needs to be addressed differently in the different > >>>> >setups. > >>>> > > >>>> >/Henrik > >>>> > > >>>> >On Tue, Nov 20, 2012 at 6:59 AM, Ramon Diaz-Uriarte < > rdia...@gmail.com> > >>>> >wrote: > >>>> >> > >>>> >> > >>>> >> > >>>> >> On Sat, 17 Nov 2012 13:05:29 -0800,"Ryan C. Thompson" > >>>> >><r...@thompsonclan.org> wrote: > >>>> >> > >>>> >>> On 11/17/2012 02:39 AM, Ramon Diaz-Uriarte wrote: > >>>> >>> > In addition to Steve's comment, is it really a good thing that > "all > >>>> >>>code > >>>> >>> > stays the same."? I mean, multiple machines vs. multiple cores > are, > >>>> >>> > often, _very_ different things: for instance, shared vs. > distributed > >>>> >>> > memory, communication overhead differences, whether or not you > can > >>>> >>>assume > >>>> >>> > packages and objects to be automagically present in the > slaves/child > >>>> >>> > process, etc. So, given they are different situations, I think > it > >>>> >>> > sometimes makes sense to want to write different code for each > >>>> >>>situation > >>>> >>> > (I often do); not to mention Steve's hybrid cases ;-). > >>>> >>> > > >>>> >>> > > >>>> >>> > Since BiocParallel seems to be a major undertaking, maybe it > would > >>>> >>> > be > >>>> >>> > appropriate to provide a flexible approach, instead of hard > wiring > >>>> >>>the > >>>> >>> > foreach approach. > >>>> >>> Of course there are cases where the same code simply can't work > for > >>>> >>>both > >>>> >>> multicore and multi-machine situations, but those generally don't > fall > >>>> >>> into the category of things that can be done using lapply. Lapply > and > >>>> >>> all of its parallelized buddies like mclapply, parLapply, and > foreach > >>>> >>> are designed for data-parallel operations with no interdependence > >>>> >>> between results, and these kinds of operations generally > parallelize > >>>> >>> as > >>>> >>> well across machines as across cores, unless your network is not > fast > >>>> >>> enough (in which case you would choose not to use multi-machine > >>>> >>> parallelism). If you want a parallel algorithm for something like > the > >>>> >>> disjoin method of GRanges, you might need to write some special > >>>> >>> purpose > >>>> >>> code, and that code might be very different for multicore vs > >>>> >>>multi-machine. > >>>> >> > >>>> >>> So yes, sometimes there is a fundamental reason that you have to > >>>> >>> change > >>>> >>> the code to make it run on multiple machines, and neither foreach > nor > >>>> >>> any other parallelization framework will save you from having to > >>>> >>>rewrite > >>>> >>> your code. But often there is no fundamental reason that the code > has > >>>> >>>to > >>>> >>> change, but you end up changing it anyway because of limitations > in > >>>> >>>your > >>>> >>> parallelization framework. This is the case that foreach saves you > >>>> >>>from. > >>>> >> > >>>> >> > >>>> >> > >>>> >> Hummm... I guess you are right, and we are talking about "often" or > >>>> >>"most > >>>> >> of the time", which is where all this would fit. Point taken. > >>>> >> > >>>> >> > >>>> >> Best, > >>>> >> > >>>> >> R. > >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> -- > >>>> >> Ramon Diaz-Uriarte > >>>> >> Department of Biochemistry, Lab B-25 > >>>> >> Facultad de Medicina > >>>> >> Universidad Autónoma de Madrid > >>>> >> Arzobispo Morcillo, 4 > >>>> >> 28029 Madrid > >>>> >> Spain > >>>> >> > >>>> >> Phone: +34-91-497-2412 > >>>> >> > >>>> >> Email: rdia...@gmail.com > >>>> >> ramon.d...@iib.uam.es > >>>> >> > >>>> >> http://ligarto.org/rdiaz > >>>> >> > >>>> >> _______________________________________________ > >>>> >> Bioc-devel@r-project.org mailing list > >>>> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel > >>>> > >>> > > > > _______________________________________________ > > Bioc-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/bioc-devel > [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel