Jay - good points on rolling broker upgrades/config changes and the challenges of having an app master coordinate this type of thing. Not really specific to kafka but something you would hope an app master managing these types of services would take care off. I also thought an app master should provide a 'suggested' set of containers to have the brokers installed/run on based on container info - cores, vm. The app master would communicate with the YARN client and indicate these are the containers you want your kafka brokers on - ok to install? I think all the above is way beyond state-of-the-art app masters today - though i may be wrong. Interesting point on using kafka last known broker location, the reason i went with separate zookeeper entries was the failure scenario of the app master dying prior to the brokers starting... but it ends up being state management regardless - the app master must know what cluster nodes have kafka brokers installed and their running state. I agree that disk cleanup on container failure or reassigment is needed in general - currently no such hook in YARN and implicitly delegated to the app master realm though in reality this is a resource manager responsibility IMO.
On Wednesday, July 23, 2014 5:17 PM, Jay Kreps <jay.kr...@gmail.com> wrote: Yeah restoring data is definitely expensive. If you have 5TB/machine then you will need to restore 5TB of data. Running this way then there is no particular functionality you need out of the app master other than and setting the right node id. Obviously you do need HA RM to make this work. I think you also need a way to push new broker code one node at a time to upgrade Kafka itself or want to change a config. The feature that YARN needs to support running this kind of stateful service more happily is a backoff on reassigning a container and cleaning up data on disk when the process fails. Kafka itself actually tracks the last known location of a given node even if it is down, so a Kafka app master could request the same machine it was previously on and reuse its data if it is still there. -Jay On Wed, Jul 23, 2014 at 4:44 PM, Kam Kasravi <kamkasr...@yahoo.com.invalid> wrote: > Thanks Joe for the input related to Mesos as well as acknowledging the need > for YARN to support this type of cluster allocation - long running services > with node locality priority. > > Thanks Jay - That's an interesting fact that I wasn't aware of - though I > imagine there could possibly be a long latency for the replica data to be > transferred to the new broker (depending on #/size of partitions). It does > open up some possibilities to restart brokers on app master restart using > different containers (as well as some complications if an old container with > old data were reallocated on restart). I had used zookeeper to store broker > locations so the app master on restart would look for this information and > attempt to reallocate containers on these nodes. All this said, would this > be part of kafka or some other framework? I can see kafka benefitting from > this at the same time kafka's appeal IMO is it's simplicity. Spark has chosen > to include YARN within its distribution, not sure what the kafka team thinks. > > > > On Wednesday, July 23, 2014 4:19 PM, Jay Kreps <jay.kr...@gmail.com> wrote: > > > > Hey Kam, > > It would be nice to have a way to get a failed node back with it's > original data, but this isn't strictly necessary, it is just a good > optimization. As long as you run with replication you can restart a > broker elsewhere with no data, and it will restore it's state off the > other replicas. > > -Jay > > > On Wed, Jul 23, 2014 at 3:47 PM, Kam Kasravi > <kamkasr...@yahoo.com.invalid> wrote: >> Hi >> >> Kafka-on-yarn requires YARN to consistently allocate a kafka broker at a >> particular resource since the broker needs to always use its local data. >> YARN doesn't do this well, unless you provide (override) the default >> scheduler (CapacityScheduler or FairScheduler). SequenceIO did something >> along these lines for a different use case. Unfortunately replacing the >> scheduler is a global operation which would affect all App masters. >> Additionally one could argue that the broker should be run as an OS service >> and auto restarted on failure if necessary. Slider (incubating) did some of >> this groundwork but YARN still has lots of limitations in providing >> guarantees to consistently allocate a container on a particular node >> especially on appmaster restart (eg ResourceManager dies). That said, it >> might be worthwhile to enumerate all of this here with some possible >> solutions. If there is interest I could certainly list the relevant JIRA's >> along with some additional JIRA's >> required IMO. >> >> Thanks >> Kam >> >> >> On Wednesday, July 23, 2014 2:37 PM, "hsy...@gmail.com" <hsy...@gmail.com> >> wrote: >> >> >> >> Hi guys, >> >> Kafka is getting more and more popular and in most cases people run kafka >> as long-term service in the cluster. Is there a discussion of running kafka >> on yarn cluster which we can utilize the convenient configuration/resource >> management and HA. I think there is a big potential and requirement for >> that. >> I found a project https://github.com/kkasravi/kafka-yarn. But is there a >> official roadmap/plan for this? >> >> Thank you very much! >> >> Best, >> Siyuan