Thanks. On Tue, Dec 4, 2012 at 3:47 AM, Vincent Carey <st...@channing.harvard.edu> wrote: > I have been booked up so no chance to deploy but I do have access to SGE and > LSF so will try and will report ASAP.
...and I'll try it out on PBS (... but I most likely won't have time to do this until the end of the year). Henrik > > > On Tue, Dec 4, 2012 at 4:08 AM, Hahne, Florian <florian.ha...@novartis.com> > wrote: >> >> Hi Henrik, >> I have now come up now with a relatively generic version of this >> SGEcluster approach. It does indeed use BatchJobs under the hood and >> should thus support all available cluster queues, assuming that the >> necessary batchJobs routines are available. I could only test this on our >> SGE cluster, but Vince wanted to try other queuing systems. Not sure how >> far he got. For now the code is wrapped in a little package called >> Qcluster with some documentation. If you want to I can send you a version >> in a separate mail. Would be good to test this on other systems, and I am >> sure there remain some bugs that need to be ironed out. In particular the >> fault tolerance you mentioned needs to be addressed properly. Currently >> the code may leave unwanted garbage if things fail in the wrong places >> because all the communication is file-based. >> Martin, I'll send you my updated version in case you want to include this >> in biocParallel for others to contribute. >> Florian >> -- >> >> >> >> >> >> >> On 12/4/12 5:46 AM, "Henrik Bengtsson" <h...@biostat.ucsf.edu> wrote: >> >> >Picking up this thread in lack of other places (= were should >> >BiocParallel be discussed?) >> > >> >I saw Martin's updates on the BiocParallel - great. Florian's SGE >> >scheduler was also mentioned; is that one built on top of BatchJobs? >> >If so I'd be interested in looking into that/generalizing that to work >> >with any BatchJobs scheduler. >> > >> >I believe there is going to be a new release of BatchJobs rather soon, >> >so it's probably worth waiting until that is available. >> > >> >The main use case I'm interested in is to launch batch jobs on a >> >PBS/Torque cluster, and then use multicore processing on each compute >> >node. It would be nice to be able to do this using the BiocParallel >> >model, but maybe it is too optimistic to get everything to work under >> >same model. Also, as Vince hinted, fault tolerance etc needs to be >> >addressed and needs to be addressed differently in the different >> >setups. >> > >> >/Henrik >> > >> >On Tue, Nov 20, 2012 at 6:59 AM, Ramon Diaz-Uriarte <rdia...@gmail.com> >> >wrote: >> >> >> >> >> >> >> >> On Sat, 17 Nov 2012 13:05:29 -0800,"Ryan C. Thompson" >> >><r...@thompsonclan.org> wrote: >> >> >> >>> On 11/17/2012 02:39 AM, Ramon Diaz-Uriarte wrote: >> >>> > In addition to Steve's comment, is it really a good thing that "all >> >>>code >> >>> > stays the same."? I mean, multiple machines vs. multiple cores are, >> >>> > often, _very_ different things: for instance, shared vs. distributed >> >>> > memory, communication overhead differences, whether or not you can >> >>>assume >> >>> > packages and objects to be automagically present in the slaves/child >> >>> > process, etc. So, given they are different situations, I think it >> >>> > sometimes makes sense to want to write different code for each >> >>>situation >> >>> > (I often do); not to mention Steve's hybrid cases ;-). >> >>> > >> >>> > >> >>> > Since BiocParallel seems to be a major undertaking, maybe it would >> >>> > be >> >>> > appropriate to provide a flexible approach, instead of hard wiring >> >>>the >> >>> > foreach approach. >> >>> Of course there are cases where the same code simply can't work for >> >>>both >> >>> multicore and multi-machine situations, but those generally don't fall >> >>> into the category of things that can be done using lapply. Lapply and >> >>> all of its parallelized buddies like mclapply, parLapply, and foreach >> >>> are designed for data-parallel operations with no interdependence >> >>> between results, and these kinds of operations generally parallelize >> >>> as >> >>> well across machines as across cores, unless your network is not fast >> >>> enough (in which case you would choose not to use multi-machine >> >>> parallelism). If you want a parallel algorithm for something like the >> >>> disjoin method of GRanges, you might need to write some special >> >>> purpose >> >>> code, and that code might be very different for multicore vs >> >>>multi-machine. >> >> >> >>> So yes, sometimes there is a fundamental reason that you have to >> >>> change >> >>> the code to make it run on multiple machines, and neither foreach nor >> >>> any other parallelization framework will save you from having to >> >>>rewrite >> >>> your code. But often there is no fundamental reason that the code has >> >>>to >> >>> change, but you end up changing it anyway because of limitations in >> >>>your >> >>> parallelization framework. This is the case that foreach saves you >> >>>from. >> >> >> >> >> >> >> >> Hummm... I guess you are right, and we are talking about "often" or >> >>"most >> >> of the time", which is where all this would fit. Point taken. >> >> >> >> >> >> Best, >> >> >> >> R. >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> Ramon Diaz-Uriarte >> >> Department of Biochemistry, Lab B-25 >> >> Facultad de Medicina >> >> Universidad Autónoma de Madrid >> >> Arzobispo Morcillo, 4 >> >> 28029 Madrid >> >> Spain >> >> >> >> Phone: +34-91-497-2412 >> >> >> >> Email: rdia...@gmail.com >> >> ramon.d...@iib.uam.es >> >> >> >> http://ligarto.org/rdiaz >> >> >> >> _______________________________________________ >> >> Bioc-devel@r-project.org mailing list >> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> > _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel