Probably way off-topic but just a question: should a generic interface target something like DRMAA?
http://en.wikipedia.org/wiki/DRMAA That would work across most clusters as it’s a single unified API. (there is a DRMAA module, Schedule::DRMAAc, but I believe it’s XS-based and way out of date; at least I could never get it to install) chris On Sep 5, 2014, at 9:12 AM, John Macdonald <john.macdon...@oicr.on.ca> wrote: > Dana, I may be wrong here, but I think that Hadoop is one form of compute > cluster management software, just as SGE is. I'm aiming to provide a generic > interface layer that you can use for writing code to be distributed across a > cluster. By changing one parameter, cluster=>'Hadoop' instead of > cluster=>'SGE' your same code would run on a different type of cluster. > There would be limitations if you used cluster-specific capabilities, just as > there are the same limitations converting a database connection that uses DBI > to replace the underlying database platform, but *most* of the code would be > unaffected. (Assuming that I get a good enough generic interface definition > that captures balances the requirements and capabilities of different > clusters well enough in a single consistent form. :-) > > John Macdonald > Software Engineer > > Ontario Institute for Cancer Research > MaRS Centre > > 661 University Avenue > > Suite 510 > Toronto, Ontario > > Canada M5G 0A3 > > > Tel: > > Email: john.macdon...@oicr.on.ca > > Toll-free: 1-866-678-6427 > Twitter: @OICR_news > > > www.oicr.on.ca > > This message and any attachments may contain confidential and/or privileged > information for the sole use of the intended recipient. Any review or > distribution by anyone other than the person for whom it was originally > intended is strictly prohibited. If you have received this message in error, > please contact the sender and delete all copies. Opinions, conclusions or > other information contained in this message may not be that of the > organization. > > ________________________________________ > From: Dana Hudes [dhu...@hudes.org] > Sent: September 5, 2014 10:03 AM > To: John Macdonald > Cc: module-authors@perl.org > Subject: Re: Top level name proposal - ComputeCluster > > So you intend to develop a new pure Perl compute cluster? Because if you just > need to get the job done why would you not use Hadoop whether private cluster > or AWS? It has a Perl APi and it will cheerfully run Perl jobs. > Hadoop is an Apache project, open source free software with a large installed > base. > > -----Original Message----- > From: John Macdonald <john.macdon...@oicr.on.ca> > Date: Fri, 5 Sep 2014 13:57:47 > To: Fields, Christopher J<cjfie...@illinois.edu> > Cc: James E Keenan<jk...@verizon.net>; > module-authors@perl.org<module-authors@perl.org> > Subject: RE: Top level name proposal - ComputeCluster > > I'm intending that ComputeCluster (or whatever the final name turns out to > be) will be domain-agnostic at the top level interface at least. However, my > lab will be using it for genome analysis pipelines, and I suspect a > significant proportion of the potential other users will also be in this > field (as shown by the repsonses on this discussion already) so there could > be domain-specific submodules - either within this namespace or in other > namespaces simply using this module set. > > Chris, Alex, and anyone else who is interested as a potential future > user/contributor, feel free to email me outside of this module-authors > discussion about how the actual module will develop. > > John Macdonald > Software Engineer > > Ontario Institute for Cancer Research > MaRS Centre > > 661 University Avenue > > Suite 510 > Toronto, Ontario > > Canada M5G 0A3 > > > Tel: > > Email: john.macdon...@oicr.on.ca > > Toll-free: 1-866-678-6427 > Twitter: @OICR_news > > > www.oicr.on.ca > > This message and any attachments may contain confidential and/or privileged > information for the sole use of the intended recipient. Any review or > distribution by anyone other than the person for whom it was originally > intended is strictly prohibited. If you have received this message in error, > please contact the sender and delete all copies. Opinions, conclusions or > other information contained in this message may not be that of the > organization. > > ________________________________________ > From: Fields, Christopher J [cjfie...@illinois.edu] > Sent: September 5, 2014 9:47 AM > To: John Macdonald > Cc: James E Keenan; module-authors@perl.org > Subject: Re: Top level name proposal - ComputeCluster > > Yup, I agree. I think Cluster is too generic and can mean a lot of things (I > think of cluster analysis myself). Maybe something more distinctive? Is it > application- or domain-specific (bioinformatics, etc)? > > There are a few tools with similar functionality that come to mind. Most of > them have catchy names; one written in Perl is Clusterflow (not on CPAN but > here: https://github.com/ewels/clusterflow/). Another is the (completely > unmaintained, likely broken, but possibly useful for something) biopipe > project: https://github.com/bioperl/bioperl-pipeline. I have thought about > retooling the latter to be less reliant on bioperl and more a stand-alone > tool. > > There are a couple Java tools also: bpipe (https://code.google.com/p/bpipe/) > and nextflow (https://github.com/nextflow-io/nextflow). > > And I agree with Alex; as you might guess based on my comment on biopipe, our > group would be very interested in helping out on this, even if it’s at simply > the testing phase (we run PBS/Torque locally). > > chris > > On Sep 5, 2014, at 8:00 AM, John Macdonald <john.macdon...@oicr.on.ca> wrote: > >> Cluster was my first thought for a name, but when I did a search to see what >> modules already existed (bot in case someone had already written a generic >> cluster module saving me the bother of starting a new one, and to see what >> types of cluster had cluster-specific modules written for them) the word >> cluster came up in a large number of contexts. An tightly connected group >> of "things" is a cluster (e.g. nodes in a graph) - so I didn't think that >> the simple name would be clear enough. The name Cluster leaves the reader >> with the immediate question "Cluster of what?". >> >> John Macdonald >> Software Engineer >> >> Ontario Institute for Cancer Research >> MaRS Centre >> >> 661 University Avenue >> >> Suite 510 >> Toronto, Ontario >> >> Canada M5G 0A3 >> >> >> Tel: >> >> Email: john.macdon...@oicr.on.ca >> >> Toll-free: 1-866-678-6427 >> Twitter: @OICR_news >> >> >> www.oicr.on.ca >> >> This message and any attachments may contain confidential and/or privileged >> information for the sole use of the intended recipient. Any review or >> distribution by anyone other than the person for whom it was originally >> intended is strictly prohibited. If you have received this message in error, >> please contact the sender and delete all copies. Opinions, conclusions or >> other information contained in this message may not be that of the >> organization. >> >> ________________________________________ >> From: James E Keenan [jk...@verizon.net] >> Sent: September 5, 2014 7:25 AM >> To: module-authors@perl.org >> Subject: Re: Top level name proposal - ComputeCluster >> >> On 09/04/2014 10:23 AM, John Macdonald wrote: >>> Hi, >>> >>> I wanted to get general comment/concensus about a top level name that I >>> am proposing. >>> >>> I'm starting to organize a set of modules for managing jobs on a >>> computer cluster. I intend it to work much like DBI - with a top level >>> abstract interface that programs can use, actually implemented by >>> drivers that translate the common interface into the interface used by >>> the particular type of compute cluster that is being accessed. >>> Initially, I will provide a driver for SGE, since that is what we have >>> and use in our lab (but after I have that running, my PI can get me >>> access to a couple of other type of compute cluster to add some more. >>> >>> For naming, I am planning to use: >>> >>> ComputeCluster - top level name >>> - will provide switching functions to create a class of object >>> for a particular cluster type >>> >> >> Could that be shortened to simply: Cluster ? >