Re: Top level name proposal - ComputeCluster

Mark Hedges Sun, 07 Sep 2014 11:36:06 -0700

Doesn't Hadoop have to restart the perl interpreter for every execution
step, i.e. run a script that performs the map or reduce operation?


It seems like a single perl interpreter could listen on TCP for
authenticated subroutines to run in threads, passing them on to idle
neighbors if busy.  No need for a scheduler? Scale by adding nodes; they
glom together by broadcast registration, no single point of failure. They
all UDP-broadcast their load averages to each other and keep track of each
other from a detached `nice` child process. Load average is the only
criteria used for work assignment. That process acts as a control channel.

Object API for the cluster lets you give it chains of dependent
subroutines. Subroutines can be defined to open a listener, broadcast the
uuids of the subroutines it is waiting for, and listen until they get all
expected results. If it gets missed, the subroutine "return" function
broadcasts to ask which listener expects its results. Implement map-reduce
this way? And whatever...

Mark
On Sep 7, 2014 11:01 AM, "Dana Hudes" <dhu...@hudes.org> wrote:

> There exists a Perl interface to Hadoop. I can't look up right now, but i
> think that was under Apache:: . AWS also offer Hadoop as a service with
> Perl and PHP interface at least. Under AWS::Hadoop IIRC.
>
> -----Original Message-----
> From: "Fields, Christopher J" <cjfie...@illinois.edu>
> Date: Fri, 5 Sep 2014 13:47:26
> To: John Macdonald<john.macdon...@oicr.on.ca>
> Cc: James E Keenan<jk...@verizon.net>; module-authors@perl.org<
> module-authors@perl.org>
> Subject: Re: Top level name proposal - ComputeCluster
>
> Yup, I agree.  I think Cluster is too generic and can mean a lot of things
> (I think of cluster analysis myself).  Maybe something more distinctive?
> Is it application- or domain-specific (bioinformatics, etc)?
>
> There are a few tools with similar functionality that come to mind.  Most
> of them have catchy names; one written in Perl is Clusterflow (not on CPAN
> but here: https://github.com/ewels/clusterflow/).  Another is the
> (completely unmaintained, likely broken, but possibly useful for something)
> biopipe project: https://github.com/bioperl/bioperl-pipeline.  I have
> thought about retooling the latter to be less reliant on bioperl and more a
> stand-alone tool.
>
> There are a couple Java tools also: bpipe (
> https://code.google.com/p/bpipe/) and nextflow (
> https://github.com/nextflow-io/nextflow).
>
> And I agree with Alex; as you might guess based on my comment on biopipe,
> our group would be very interested in helping out on this, even if it's at
> simply the testing phase (we run PBS/Torque locally).
>
> chris
>
> On Sep 5, 2014, at 8:00 AM, John Macdonald <john.macdon...@oicr.on.ca>
> wrote:
>
> > Cluster was my first thought for a name, but when I did a search to see
> what modules already existed (bot in case someone had already written a
> generic cluster module saving me the bother of starting a new one, and to
> see what types of cluster had cluster-specific modules written for them)
> the word cluster came up in a large number of contexts.  An tightly
> connected group of "things" is a cluster (e.g. nodes in a graph) - so I
> didn't think that the simple name would be clear enough.  The name Cluster
> leaves the reader with the immediate question "Cluster of what?".
> >
> > John Macdonald
> > Software Engineer
> >
> > Ontario Institute for Cancer Research
> > MaRS Centre
> >
> > 661 University Avenue
> >
> > Suite 510
> > Toronto, Ontario
> >
> > Canada M5G 0A3
> >
> >
> > Tel:
> >
> > Email: john.macdon...@oicr.on.ca
> >
> > Toll-free: 1-866-678-6427
> > Twitter: @OICR_news
> >
> >
> > www.oicr.on.ca
> >
> > This message and any attachments may contain confidential and/or
> privileged information for the sole use of the intended recipient. Any
> review or distribution by anyone other than the person for whom it was
> originally intended is strictly prohibited. If you have received this
> message in error, please contact the sender and delete all copies.
> Opinions, conclusions or other information contained in this message may
> not be that of the organization.
> >
> > ________________________________________
> > From: James E Keenan [jk...@verizon.net]
> > Sent: September 5, 2014 7:25 AM
> > To: module-authors@perl.org
> > Subject: Re: Top level name proposal - ComputeCluster
> >
> > On 09/04/2014 10:23 AM, John Macdonald wrote:
> >> Hi,
> >>
> >> I wanted to get general comment/concensus about a top level name that I
> >> am proposing.
> >>
> >> I'm starting to organize a set of modules for managing jobs on a
> >> computer cluster.  I intend it to work much like DBI - with a top level
> >> abstract interface that programs can use, actually implemented by
> >> drivers that translate the common interface into the interface used by
> >> the particular type of compute cluster that is being accessed.
> >> Initially, I will provide a driver for SGE, since that is what we have
> >> and use in our lab (but after I have that running, my PI can get me
> >> access to a couple of other type of compute cluster to add some more.
> >>
> >> For naming, I am planning to use:
> >>
> >>     ComputeCluster - top level name
> >>       - will provide switching functions to create a class of object
> >> for a particular cluster type
> >>
> >
> > Could that be shortened to simply:  Cluster ?
>
>

Re: Top level name proposal - ComputeCluster

Reply via email to