On Sat, Apr 12, 2014 at 3:27 AM, Steve Loughran <ste...@hortonworks.com>wrote:

> On 10 April 2014 16:28, Andrew Purtell <apurt...@apache.org> wrote:
>
> > Hi Steve,
> >
> > Does Slider target the deployment and management of components/projects
> in
> > the Hadoop project itself? Not just the ecosystem examples mentioned in
> the
> > proposal? I don't see this mentioned in the proposal.
> >
>
> no.
>
>
It seems what you propose for other Hadoop ecosystem components with Slider
applies to some parts of core.



> That said, some of the stuff I'm prototyping on a service registry should
> be usable for existing code -there's no reason why a couple of zookeeper
> arguments shouldn't be enough to look up the bindings for HDFS, Yarn, etc.
>
> I've not done much there -currently seeing how well curator service
> discovery works- so assistance would be welcome.
>
>
> >
> > The reason I ask is I'm wondering how Slider differentiates from projects
> > like Apache Twill or Apache Bigtop that are already existing vehicles for
> > achieving the aims discussed in the Slider proposal.
>
>
> Twill: handles all the AM logic for running new code packaged as a JAR with
> an executor method
> Bigtop: stack testing


As a Bigtop committer, I disagree with this narrow interpretation of the
scope of the project, but this is my personal opinion and I am not PMC...

For example, we package Hadoop core and ecosystem services both for
deployment, have Puppet based deployment automation (which can be used more
generally than merely for setting up test clusters), and I have been
considering filing JIRAs to tie in cgroups at the whole stack level here.
What is missing of course is a hierarchical model for resource management,
and tools within the components for differentiated service levels, but that
is another discussion.



> > Tackling
> > cross-component resource management issues could certainly be that, but
> > only if core Hadoop services are also brought into the deployment and
> > management model, because IO pathways extend over multiple layers and
> > components. You mention HBase and Accumulo as examples. Both are HDFS
> > clients. Would it be insufficient to reserve or restrict resources for
> e.g.
> > the HBase RegionServer without also considering the HDFS DataNode?
>
>
> IO quotas is a tricky one -you can't cgroup-throttle a container for HDFS
> IO as it takes place on local and remote DN processes. Without doing some
> priority queuing in the DNs we can hope for some labelling of nodes in the
> YARN cluster so you can at least isolate the high-SLA apps from IO
> intensive but lower priority code.


Yes. Do you see this as something Slider could motivate and drive?



> > Do the
> > HDFS DataNode and HBase RegionServer have exactly the same kind of
> > deployment, recovery/restart, and dynamic scaling concerns?
>
>
> DN's react to loss of the NN by spinning on the cached IP address, or, in
> HA, to the defined failover address. Now, if we did support ZK lookup of NN
> IPC and Web ports we could consider an alternate failure mode where the DNs
> do intermittently poll the ZK bindings during the spin cycle
>

Yes. But to my original question, I see a high degree of similarity in
terms of management and operational considerations even if mechanism isn't
quite there or would need tweaking. Again, do you see this as something
Slider could motivate and drive perhaps?

HBase and accumulo do have their own ZK binding mechanism, so don't really
> need their own registry. But to work with their data you do need the
> relevant client apps. I would like to have some standard for at least
> publishing the core binding information in a way that could be parsed by
> any client app (CLI, web UI, other in-cluster apps)


+1 to such a standard.



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Reply via email to