Steve - yes I have been monitoring YARN advances in this area particularly YARN-1051 which seems to have most of what long running services with hard node locality requirements need and is based on MS's Rayon framework (https://issues.apache.org/jira/secure/attachment/12628143/curino_MSR-TR-2013-108.pdf). There lots of JIRA's related to long running services like YARN-1039 that try and tackle this problem piecemeal (ie adding a long-running flag to the app and/or container) and result in lots of discussion - but YARN-1051 seems to be the most coherent approach I've seen to date.
On Wednesday, July 23, 2014 5:15 PM, Steve Morin <st...@stevemorin.com> wrote: Kam, Give it some time and think it's getting better as a real possibility for Kafka on Yarn. There are new capabilities coming out in Yarn/HDFS to allow for node groups/label that can work with locality and secondarily new functionality in HDFS that depending on the use-case can be very interesting with in-memory files. -Steve On Wed, Jul 23, 2014 at 4:44 PM, Kam Kasravi <kamkasr...@yahoo.com.invalid> wrote: Thanks Joe for the input related to Mesos as well as acknowledging the need for YARN to support this type of cluster allocation - long running services with node locality priority. > >Thanks Jay - That's an interesting fact that I wasn't aware of - though I >imagine there could possibly be a long latency for the replica data to be >transferred to the new broker (depending on #/size of partitions). It does >open up some possibilities to restart brokers on app master restart using >different containers (as well as some complications if an old container with >old data were reallocated on restart). I had used zookeeper to store broker >locations so the app master on restart would look for this information and >attempt to reallocate containers on these nodes. All this said, would this be >part of kafka or some other framework? I can see kafka benefitting from this >at the same time kafka's appeal IMO is it's simplicity. Spark has chosen to >include YARN within its distribution, not sure what the kafka team thinks. > > > > >On Wednesday, July 23, 2014 4:19 PM, Jay Kreps <jay.kr...@gmail.com> wrote: > > > >Hey Kam, > >It would be nice to have a way to get a failed node back with it's >original data, but this isn't strictly necessary, it is just a good >optimization. As long as you run with replication you can restart a >broker elsewhere with no data, and it will restore it's state off the >other replicas. > >-Jay > > >On Wed, Jul 23, 2014 at 3:47 PM, Kam Kasravi ><kamkasr...@yahoo.com.invalid> wrote: >> Hi >> >> Kafka-on-yarn requires YARN to consistently allocate a kafka broker at a >> particular resource since the broker needs to always use its local data. >> YARN doesn't do this well, unless you provide (override) the default >> scheduler (CapacityScheduler or FairScheduler). SequenceIO did something >> along these lines for a different use case. Unfortunately replacing the >> scheduler is a global operation which would affect all App masters. >> Additionally one could argue that the broker should be run as an OS service >> and auto restarted on failure if necessary. Slider (incubating) did some of >> this groundwork but YARN still has lots of limitations in providing >> guarantees to consistently allocate a container on a particular node >> especially on appmaster restart (eg ResourceManager dies). That said, it >> might be worthwhile to enumerate all of this here with some possible >> solutions. If there is interest I could certainly list the relevant JIRA's >> along with some additional JIRA's >> required IMO. >> >> Thanks >> Kam >> >> >> On Wednesday, July 23, 2014 2:37 PM, "hsy...@gmail.com" <hsy...@gmail.com> >> wrote: >> >> >> >> Hi guys, >> >> Kafka is getting more and more popular and in most cases people run kafka >> as long-term service in the cluster. Is there a discussion of running kafka >> on yarn cluster which we can utilize the convenient configuration/resource >> management and HA. I think there is a big potential and requirement for >> that. >> I found a project https://github.com/kkasravi/kafka-yarn. But is there a >> official roadmap/plan for this? >> >> Thank you very much! >> >> Best, >> Siyuan