Hi Henrik, Sorry for the late response. Suggestions and feedback are always welcome. I just forgot to enable the issue tracker (now enabled).
For prototyping I usually use Interactive/Multicore, but I'll regularly test on our local clusters which use Torque or Slurm, respectively. Michel 2013/6/7 Henrik Bengtsson <h...@biostat.ucsf.edu>: > Great - this looks promising already. > > What's your test system(s), beyond standard SSH and multicore > clusters? I'm on a Torque/PBS system. > > I'm happy to test, give feedback etc. I don't see an 'Issues' tab on > the GitHub page. Michel, how do you prefer to get feedback? > > /Henrik > > > On Thu, Jun 6, 2013 at 5:21 PM, Michael Lawrence > <lawrence.mich...@gene.com> wrote: >> And here is the on-going development of the backend: >> https://github.com/mllg/BiocParallel/tree/batchjobs >> >> Not sure how well it's been tested. >> >> Kudos to Michel Lang for making so much progress so quickly. >> >> Michael >> >> On Thu, Jun 6, 2013 at 1:59 PM, Dan Tenenbaum <dtene...@fhcrc.org> wrote: >>> >>> On Thu, Jun 6, 2013 at 1:56 PM, Henrik Bengtsson <h...@biostat.ucsf.edu> >>> wrote: >>> > Hi, I'd like to pick up the discussion on a BatchJobs backend for >>> > BiocParallel where it was left back in Dec 2012 (Bioc-devel thread >>> > 'BiocParallel' >>> > [https://stat.ethz.ch/pipermail/bioc-devel/2012-December/003918.html]). >>> > >>> > Florian, would you mind sharing your BatchJobs backend code? Is it >>> > independent of BiocParallel and/or have you tried it with the most >>> > recent BiocParallel implementation >>> > [https://github.com/Bioconductor/BiocParallel/]? >>> > >>> >>> You should be aware that there is Google Summer of Code project in >>> progress to address this. >>> >>> http://www.bioconductor.org/developers/gsoc2013/ (towards the bottom) >>> >>> Dan >>> >>> >>> > /Henrik >>> > >>> > On Tue, Dec 4, 2012 at 12:38 PM, Henrik Bengtsson <h...@biostat.ucsf.edu> >>> > wrote: >>> >> Thanks. >>> >> >>> >> On Tue, Dec 4, 2012 at 3:47 AM, Vincent Carey >>> >> <st...@channing.harvard.edu> wrote: >>> >>> I have been booked up so no chance to deploy but I do have access to >>> >>> SGE and >>> >>> LSF so will try and will report ASAP. >>> >> >>> >> ...and I'll try it out on PBS (... but I most likely won't have time >>> >> to do this until the end of the year). >>> >> >>> >> Henrik >>> >> >>> >>> >>> >>> >>> >>> On Tue, Dec 4, 2012 at 4:08 AM, Hahne, Florian >>> >>> <florian.ha...@novartis.com> >>> >>> wrote: >>> >>>> >>> >>>> Hi Henrik, >>> >>>> I have now come up now with a relatively generic version of this >>> >>>> SGEcluster approach. It does indeed use BatchJobs under the hood and >>> >>>> should thus support all available cluster queues, assuming that the >>> >>>> necessary batchJobs routines are available. I could only test this on >>> >>>> our >>> >>>> SGE cluster, but Vince wanted to try other queuing systems. Not sure >>> >>>> how >>> >>>> far he got. For now the code is wrapped in a little package called >>> >>>> Qcluster with some documentation. If you want to I can send you a >>> >>>> version >>> >>>> in a separate mail. Would be good to test this on other systems, and >>> >>>> I am >>> >>>> sure there remain some bugs that need to be ironed out. In particular >>> >>>> the >>> >>>> fault tolerance you mentioned needs to be addressed properly. >>> >>>> Currently >>> >>>> the code may leave unwanted garbage if things fail in the wrong >>> >>>> places >>> >>>> because all the communication is file-based. >>> >>>> Martin, I'll send you my updated version in case you want to include >>> >>>> this >>> >>>> in biocParallel for others to contribute. >>> >>>> Florian >>> >>>> -- >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> On 12/4/12 5:46 AM, "Henrik Bengtsson" <h...@biostat.ucsf.edu> wrote: >>> >>>> >>> >>>> >Picking up this thread in lack of other places (= were should >>> >>>> >BiocParallel be discussed?) >>> >>>> > >>> >>>> >I saw Martin's updates on the BiocParallel - great. Florian's SGE >>> >>>> >scheduler was also mentioned; is that one built on top of BatchJobs? >>> >>>> >If so I'd be interested in looking into that/generalizing that to >>> >>>> > work >>> >>>> >with any BatchJobs scheduler. >>> >>>> > >>> >>>> >I believe there is going to be a new release of BatchJobs rather >>> >>>> > soon, >>> >>>> >so it's probably worth waiting until that is available. >>> >>>> > >>> >>>> >The main use case I'm interested in is to launch batch jobs on a >>> >>>> >PBS/Torque cluster, and then use multicore processing on each >>> >>>> > compute >>> >>>> >node. It would be nice to be able to do this using the BiocParallel >>> >>>> >model, but maybe it is too optimistic to get everything to work >>> >>>> > under >>> >>>> >same model. Also, as Vince hinted, fault tolerance etc needs to be >>> >>>> >addressed and needs to be addressed differently in the different >>> >>>> >setups. >>> >>>> > >>> >>>> >/Henrik >>> >>>> > >>> >>>> >On Tue, Nov 20, 2012 at 6:59 AM, Ramon Diaz-Uriarte >>> >>>> > <rdia...@gmail.com> >>> >>>> >wrote: >>> >>>> >> >>> >>>> >> >>> >>>> >> >>> >>>> >> On Sat, 17 Nov 2012 13:05:29 -0800,"Ryan C. Thompson" >>> >>>> >><r...@thompsonclan.org> wrote: >>> >>>> >> >>> >>>> >>> On 11/17/2012 02:39 AM, Ramon Diaz-Uriarte wrote: >>> >>>> >>> > In addition to Steve's comment, is it really a good thing that >>> >>>> >>> > "all >>> >>>> >>>code >>> >>>> >>> > stays the same."? I mean, multiple machines vs. multiple cores >>> >>>> >>> > are, >>> >>>> >>> > often, _very_ different things: for instance, shared vs. >>> >>>> >>> > distributed >>> >>>> >>> > memory, communication overhead differences, whether or not you >>> >>>> >>> > can >>> >>>> >>>assume >>> >>>> >>> > packages and objects to be automagically present in the >>> >>>> >>> > slaves/child >>> >>>> >>> > process, etc. So, given they are different situations, I think >>> >>>> >>> > it >>> >>>> >>> > sometimes makes sense to want to write different code for each >>> >>>> >>>situation >>> >>>> >>> > (I often do); not to mention Steve's hybrid cases ;-). >>> >>>> >>> > >>> >>>> >>> > >>> >>>> >>> > Since BiocParallel seems to be a major undertaking, maybe it >>> >>>> >>> > would >>> >>>> >>> > be >>> >>>> >>> > appropriate to provide a flexible approach, instead of hard >>> >>>> >>> > wiring >>> >>>> >>>the >>> >>>> >>> > foreach approach. >>> >>>> >>> Of course there are cases where the same code simply can't work >>> >>>> >>> for >>> >>>> >>>both >>> >>>> >>> multicore and multi-machine situations, but those generally don't >>> >>>> >>> fall >>> >>>> >>> into the category of things that can be done using lapply. Lapply >>> >>>> >>> and >>> >>>> >>> all of its parallelized buddies like mclapply, parLapply, and >>> >>>> >>> foreach >>> >>>> >>> are designed for data-parallel operations with no interdependence >>> >>>> >>> between results, and these kinds of operations generally >>> >>>> >>> parallelize >>> >>>> >>> as >>> >>>> >>> well across machines as across cores, unless your network is not >>> >>>> >>> fast >>> >>>> >>> enough (in which case you would choose not to use multi-machine >>> >>>> >>> parallelism). If you want a parallel algorithm for something like >>> >>>> >>> the >>> >>>> >>> disjoin method of GRanges, you might need to write some special >>> >>>> >>> purpose >>> >>>> >>> code, and that code might be very different for multicore vs >>> >>>> >>>multi-machine. >>> >>>> >> >>> >>>> >>> So yes, sometimes there is a fundamental reason that you have to >>> >>>> >>> change >>> >>>> >>> the code to make it run on multiple machines, and neither foreach >>> >>>> >>> nor >>> >>>> >>> any other parallelization framework will save you from having to >>> >>>> >>>rewrite >>> >>>> >>> your code. But often there is no fundamental reason that the code >>> >>>> >>> has >>> >>>> >>>to >>> >>>> >>> change, but you end up changing it anyway because of limitations >>> >>>> >>> in >>> >>>> >>>your >>> >>>> >>> parallelization framework. This is the case that foreach saves >>> >>>> >>> you >>> >>>> >>>from. >>> >>>> >> >>> >>>> >> >>> >>>> >> >>> >>>> >> Hummm... I guess you are right, and we are talking about "often" >>> >>>> >> or >>> >>>> >>"most >>> >>>> >> of the time", which is where all this would fit. Point taken. >>> >>>> >> >>> >>>> >> >>> >>>> >> Best, >>> >>>> >> >>> >>>> >> R. >>> >>>> >> >>> >>>> >> >>> >>>> >> >>> >>>> >> >>> >>>> >> >>> >>>> >> >>> >>>> >> -- >>> >>>> >> Ramon Diaz-Uriarte >>> >>>> >> Department of Biochemistry, Lab B-25 >>> >>>> >> Facultad de Medicina >>> >>>> >> Universidad Autónoma de Madrid >>> >>>> >> Arzobispo Morcillo, 4 >>> >>>> >> 28029 Madrid >>> >>>> >> Spain >>> >>>> >> >>> >>>> >> Phone: +34-91-497-2412 >>> >>>> >> >>> >>>> >> Email: rdia...@gmail.com >>> >>>> >> ramon.d...@iib.uam.es >>> >>>> >> >>> >>>> >> http://ligarto.org/rdiaz >>> >>>> >> >>> >>>> >> _______________________________________________ >>> >>>> >> Bioc-devel@r-project.org mailing list >>> >>>> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>> >>>> >>> >>> >>> > >>> > _______________________________________________ >>> > Bioc-devel@r-project.org mailing list >>> > https://stat.ethz.ch/mailman/listinfo/bioc-devel >> >> _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel