RE: Top level name proposal - ComputeCluster

John Macdonald Fri, 05 Sep 2014 07:38:33 -0700

I looked at that a while ago.

As you say, it seems way out of date.  There were two specs - the old one was 
quite limited, the newer one looked very promising.  However, the available 
code was only for the older spec and there didn't seem to be any progress after 
that.  The spec looked like a committee design aimed at being implemented by 
teams from the companies for each of the target platforms, which made it a big 
bite to take for a single unaligned developer.


However, I should probably re-read it and see if I can get my design to fit 
their spec to the extent possible.  Stealing (er, research) is always good.

John Macdonald
Software Engineer

Ontario Institute for Cancer Research
MaRS Centre

661 University Avenue

Suite 510
Toronto, Ontario

Canada M5G 0A3


Tel:

Email: john.macdon...@oicr.on.ca

Toll-free: 1-866-678-6427
Twitter: @OICR_news


www.oicr.on.ca

This message and any attachments may contain confidential and/or privileged 
information for the sole use of the intended recipient. Any review or 
distribution by anyone other than the person for whom it was originally 
intended is strictly prohibited. If you have received this message in error, 
please contact the sender and delete all copies. Opinions, conclusions or other 
information contained in this message may not be that of the organization.

________________________________________
From: Fields, Christopher J [cjfie...@illinois.edu]
Sent: September 5, 2014 10:22 AM
To: John Macdonald
Cc: dhu...@hudes.org; module-authors@perl.org
Subject: Re: Top level name proposal - ComputeCluster

Probably way off-topic but just a question: should a generic interface target 
something like DRMAA?

    http://en.wikipedia.org/wiki/DRMAA

That would work across most clusters as it’s a single unified API.

(there is a DRMAA module, Schedule::DRMAAc, but I believe it’s XS-based and way 
out of date; at least I could never get it to install)

chris

On Sep 5, 2014, at 9:12 AM, John Macdonald <john.macdon...@oicr.on.ca> wrote:

> Dana, I may be wrong here, but I think that Hadoop is one form of compute 
> cluster management software, just as SGE is.  I'm aiming to provide a generic 
> interface layer that you can use for writing code to be distributed across a 
> cluster.  By changing one parameter, cluster=>'Hadoop' instead of 
> cluster=>'SGE' your same code would run on a different type of cluster.  
> There would be limitations if you used cluster-specific capabilities, just as 
> there are the same limitations converting a database connection that uses DBI 
> to replace the underlying database platform, but *most* of the code would be 
> unaffected.  (Assuming that I get a good enough generic interface definition 
> that captures balances the requirements and capabilities of different 
> clusters well enough in a single consistent form. :-)
>
> John Macdonald
> Software Engineer
>
> Ontario Institute for Cancer Research
> MaRS Centre
>
> 661 University Avenue
>
> Suite 510
> Toronto, Ontario
>
> Canada M5G 0A3
>
>
> Tel:
>
> Email: john.macdon...@oicr.on.ca
>
> Toll-free: 1-866-678-6427
> Twitter: @OICR_news
>
>
> www.oicr.on.ca
>
> This message and any attachments may contain confidential and/or privileged 
> information for the sole use of the intended recipient. Any review or 
> distribution by anyone other than the person for whom it was originally 
> intended is strictly prohibited. If you have received this message in error, 
> please contact the sender and delete all copies. Opinions, conclusions or 
> other information contained in this message may not be that of the 
> organization.
>
> ________________________________________
> From: Dana Hudes [dhu...@hudes.org]
> Sent: September 5, 2014 10:03 AM
> To: John Macdonald
> Cc: module-authors@perl.org
> Subject: Re: Top level name proposal - ComputeCluster
>
> So you intend to develop a new pure Perl compute cluster? Because if you just 
> need to get the job done why would you not use Hadoop whether private cluster 
> or AWS? It has a Perl APi and it will cheerfully run Perl jobs.
> Hadoop is an Apache project, open source free software with a large installed 
> base.
>
> -----Original Message-----
> From: John Macdonald <john.macdon...@oicr.on.ca>
> Date: Fri, 5 Sep 2014 13:57:47
> To: Fields, Christopher J<cjfie...@illinois.edu>
> Cc: James E Keenan<jk...@verizon.net>; 
> module-authors@perl.org<module-authors@perl.org>
> Subject: RE: Top level name proposal - ComputeCluster
>
> I'm intending that ComputeCluster (or whatever the final name turns out to 
> be) will be domain-agnostic at the top level interface at least.  However, my 
> lab will be using it for genome analysis pipelines, and I suspect a 
> significant proportion of the potential other users will also be in this 
> field (as shown by the repsonses on this discussion already) so there could 
> be domain-specific submodules - either within this namespace or in other 
> namespaces simply using this module set.
>
> Chris, Alex, and anyone else who is interested as a potential future 
> user/contributor, feel free to email me outside of this module-authors 
> discussion about how the actual module will develop.
>
> John Macdonald
> Software Engineer
>
> Ontario Institute for Cancer Research
> MaRS Centre
>
> 661 University Avenue
>
> Suite 510
> Toronto, Ontario
>
> Canada M5G 0A3
>
>
> Tel:
>
> Email: john.macdon...@oicr.on.ca
>
> Toll-free: 1-866-678-6427
> Twitter: @OICR_news
>
>
> www.oicr.on.ca
>
> This message and any attachments may contain confidential and/or privileged 
> information for the sole use of the intended recipient. Any review or 
> distribution by anyone other than the person for whom it was originally 
> intended is strictly prohibited. If you have received this message in error, 
> please contact the sender and delete all copies. Opinions, conclusions or 
> other information contained in this message may not be that of the 
> organization.
>
> ________________________________________
> From: Fields, Christopher J [cjfie...@illinois.edu]
> Sent: September 5, 2014 9:47 AM
> To: John Macdonald
> Cc: James E Keenan; module-authors@perl.org
> Subject: Re: Top level name proposal - ComputeCluster
>
> Yup, I agree.  I think Cluster is too generic and can mean a lot of things (I 
> think of cluster analysis myself).  Maybe something more distinctive?  Is it 
> application- or domain-specific (bioinformatics, etc)?
>
> There are a few tools with similar functionality that come to mind.  Most of 
> them have catchy names; one written in Perl is Clusterflow (not on CPAN but 
> here: https://github.com/ewels/clusterflow/).  Another is the (completely 
> unmaintained, likely broken, but possibly useful for something) biopipe 
> project: https://github.com/bioperl/bioperl-pipeline.  I have thought about 
> retooling the latter to be less reliant on bioperl and more a stand-alone 
> tool.
>
> There are a couple Java tools also: bpipe (https://code.google.com/p/bpipe/) 
> and nextflow (https://github.com/nextflow-io/nextflow).
>
> And I agree with Alex; as you might guess based on my comment on biopipe, our 
> group would be very interested in helping out on this, even if it’s at simply 
> the testing phase (we run PBS/Torque locally).
>
> chris
>
> On Sep 5, 2014, at 8:00 AM, John Macdonald <john.macdon...@oicr.on.ca> wrote:
>
>> Cluster was my first thought for a name, but when I did a search to see what 
>> modules already existed (bot in case someone had already written a generic 
>> cluster module saving me the bother of starting a new one, and to see what 
>> types of cluster had cluster-specific modules written for them) the word 
>> cluster came up in a large number of contexts.  An tightly connected group 
>> of "things" is a cluster (e.g. nodes in a graph) - so I didn't think that 
>> the simple name would be clear enough.  The name Cluster leaves the reader 
>> with the immediate question "Cluster of what?".
>>
>> John Macdonald
>> Software Engineer
>>
>> Ontario Institute for Cancer Research
>> MaRS Centre
>>
>> 661 University Avenue
>>
>> Suite 510
>> Toronto, Ontario
>>
>> Canada M5G 0A3
>>
>>
>> Tel:
>>
>> Email: john.macdon...@oicr.on.ca
>>
>> Toll-free: 1-866-678-6427
>> Twitter: @OICR_news
>>
>>
>> www.oicr.on.ca
>>
>> This message and any attachments may contain confidential and/or privileged 
>> information for the sole use of the intended recipient. Any review or 
>> distribution by anyone other than the person for whom it was originally 
>> intended is strictly prohibited. If you have received this message in error, 
>> please contact the sender and delete all copies. Opinions, conclusions or 
>> other information contained in this message may not be that of the 
>> organization.
>>
>> ________________________________________
>> From: James E Keenan [jk...@verizon.net]
>> Sent: September 5, 2014 7:25 AM
>> To: module-authors@perl.org
>> Subject: Re: Top level name proposal - ComputeCluster
>>
>> On 09/04/2014 10:23 AM, John Macdonald wrote:
>>> Hi,
>>>
>>> I wanted to get general comment/concensus about a top level name that I
>>> am proposing.
>>>
>>> I'm starting to organize a set of modules for managing jobs on a
>>> computer cluster.  I intend it to work much like DBI - with a top level
>>> abstract interface that programs can use, actually implemented by
>>> drivers that translate the common interface into the interface used by
>>> the particular type of compute cluster that is being accessed.
>>> Initially, I will provide a driver for SGE, since that is what we have
>>> and use in our lab (but after I have that running, my PI can get me
>>> access to a couple of other type of compute cluster to add some more.
>>>
>>> For naming, I am planning to use:
>>>
>>>    ComputeCluster - top level name
>>>      - will provide switching functions to create a class of object
>>> for a particular cluster type
>>>
>>
>> Could that be shortened to simply:  Cluster ?
>

RE: Top level name proposal - ComputeCluster

Reply via email to