Hi, I'm a committer/PMC member for Apache Helix, and we've already done some of the work integrating with systems like YARN (pretty much ready to go) and Mesos (about 1/3 of the way there). For some background, Helix separates cluster management logic from application logic by modeling an application's lifecycle as a state machine. It uses ZooKeeper for coordination, though ZK is behind a layer of abstraction as well.
In essence, what Helix has done well up until now is solved this problem: given something you want to distribute, some constraints on how it should be distributed, and a set of live nodes, come up with a mapping of tasks to nodes and what state each task should be in (e.g. master, slave, online, offline, error, leader, standby). What we were missing was a way to actually affect the presence/absence of nodes that we want to assign to. So we came up with a general interface for Helix to tell YARN, Mesos, or anything else that it should start up a new container so that Helix can assign tasks to it. We did this in a general way, and have a working implementation for YARN. Coming from the other side, Helix allows us to be much more fine-grained with how we use containers. Helix can dynamically assign/deassign tasks to containers based on application requirements, and does this in a general way. This allows for potential container reuse, hiding of container restart overhead (because we can transition something else to leader/master in the mean time), and potentially better container utilization. Here are the slides for the talk we gave at ApacheCon describing the high-level architecture of the integration: http://www.slideshare.net/KanakBiscuitwala/finegrained-scheduling-with-helix-apachecon-na-2014 The source code for the integration is on the helix-provisioning branch (see source links here: http://helix.apache.org/sources.html). The helix-provisioning module contains all of the code done to make the integration work including the app master code. The helloworld-provisioning-yarn module in recipes is a no-op service. Here is an email with steps on how to make this work end-to-end: http://markmail.org/message/cddcdy4iphleyueb The key classes to look at are: - AppLauncher (the client that submits the YAML file describing the service) -- this is what you would actually submit to the YARN RM - AppMasterLauncher (the deployed app master with Helix controller and integration with YARN APIs) - ParticipantLauncher (code that is invoked when a container for the app starts, runs an instance of the service that uses Helix) We'd be happy to collaborate to see if there is a way to make Helix, Slider, Twill, and others better. Thanks, Kanak On Sat, Apr 12, 2014 at 3:38 PM, Roman Shaposhnik <rv...@apache.org> wrote: On Sat, Apr 12, 2014 at 11:58 AM, Andrew Purtell <apur...@apache.org> wrote: The reason I ask is I'm wondering how Slider differentiates from projects like Apache Twill or Apache Bigtop that are already existing vehicles for achieving the aims discussed in the Slider proposal. Twill: handles all the AM logic for running new code packaged as a JAR with an executor method Bigtop: stack testing As a Bigtop committer, I disagree with this narrow interpretation of the scope of the project, but this is my personal opinion and I am not PMC... A strong +1 here! Bigtop attacks a problem of packaging and deploying Hadoop stacks from a classical UNIX packaging background. We are also slowly moving into container/VM/OSv packaging territory which could be an extremely exciting way of side-stepping the general installer issues (something that Ambari struggles mightily with). Something like Slider tries to leapfrog and side-step UNIX packaging and deployment altogether. This is an interesting take on the problem, but ultimately the jury is still out on what the level of adoption for "everything is now YARN" will be. At the end of the day, we will need both for a really long time. For example, we package Hadoop core and ecosystem services both for deployment, have Puppet based deployment automation (which can be used more generally than merely for setting up test clusters), and I have been considering filing JIRAs to tie in cgroups at the whole stack level here. What is missing of course is a hierarchical model for resource management, and tools within the components for differentiated service levels, but that is another discussion. On that note, I find YARN's attitude towards cgroups be, how shall I put it, optimistic. If you look carefully you can see that the Linux community has completely given up an pretending that one can use naked cgroup trees for reliable resource partitioning: http://www.freedesktop.org/wiki/Software/systemd/PaxControlGroups/ It is now clear to me that the path Linux distros are endorsing is via the brokers such as systemd: http://www.freedesktop.org/wiki/Software/systemd/ControlGroupInterface/ As such I'd see Bigtop providing quite a bit of value out of the box via tight integration with systemd. YARN, at the moment, is in a trickier situation. HBase and accumulo do have their own ZK binding mechanism, so don't really need their own registry. But to work with their data you do need the relevant client apps. I would like to have some standard for at least publishing the core binding information in a way that could be parsed by any client app (CLI, web UI, other in-cluster apps) +1 to such a standard. I've been known to advocate Apache Helix as a standard API to build exactly that type of distributed application architecture. In fact, if only I had more spare time on my hand, I'd totally prototype Helix-based YARN APIs. Thanks, Roman. --------------------------------------------------------------------- To unsubscribe, e-mail: gene...@incubator.apache.org For additional commands, e-mail: gene...@incubator.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org