Re: Kafka on yarn

2014-07-24 Thread Kam Kasravi
Hi Gwen Your recommendations in the field to partition off non-cluster nodes and reserve them for kafka brokers totally make sense given current YARN limitations. I'm familiar with the llama hacks - effectively reserving containers with dummy processes that just sit there and then running the '

Re: Kafka on yarn

2014-07-24 Thread Kam Kasravi
Jay - good points on rolling broker upgrades/config changes and the challenges of having an app master coordinate this type of thing. Not really specific to kafka but something you would hope an app master managing these types of services would take care off.  I also thought an app master should

Re: Kafka on yarn

2014-07-24 Thread Kam Kasravi
Steve - yes I have been monitoring YARN advances in this area particularly YARN-1051 which seems to have most of what long running services with hard node locality requirements need and is based on MS's Rayon framework (https://issues.apache.org/jira/secure/attachment/12628143/curino_MSR-TR-2013

Re: Kafka on yarn

2014-07-23 Thread Gwen Shapira
Hi, Can we discuss for a moment the use-case of Kafka-on-YARN? I (as Cloudera field engineer) typically advise my customers to install Kafka on their own nodes, to allow Kafka uninterrupted access to disks. Hadoop processes tend to be a bit IO heavy. Also, I can't see any benefit from co-locating

Re: Kafka on yarn

2014-07-23 Thread hsy...@gmail.com
Thanks guys for your knowledge. Is there any other concern on producer/consumer side? My understanding is High level consumer and producer would refresh metadata of the cluster and detect the leadership change or node failure. I guess, there shouldn't be anything worried if I delete 1 broker and a

Re: Kafka on yarn

2014-07-23 Thread Jay Kreps
Yeah restoring data is definitely expensive. If you have 5TB/machine then you will need to restore 5TB of data. Running this way then there is no particular functionality you need out of the app master other than and setting the right node id. Obviously you do need HA RM to make this work. I think

Re: Kafka on yarn

2014-07-23 Thread Steve Morin
Kam, Give it some time and think it's getting better as a real possibility for Kafka on Yarn. There are new capabilities coming out in Yarn/HDFS to allow for node groups/label that can work with locality and secondarily new functionality in HDFS that depending on the use-case can be very interes

Re: Kafka on yarn

2014-07-23 Thread Kam Kasravi
Thanks Joe for the input related to Mesos as well as acknowledging the need for YARN to support this type of cluster allocation - long running services with node locality priority.  Thanks Jay - That's an interesting fact that I wasn't aware of - though I imagine there could possibly be a long

Re: Kafka on yarn

2014-07-23 Thread Jay Kreps
Hey Kam, It would be nice to have a way to get a failed node back with it's original data, but this isn't strictly necessary, it is just a good optimization. As long as you run with replication you can restart a broker elsewhere with no data, and it will restore it's state off the other replicas.

Re: Kafka on yarn

2014-07-23 Thread Joe Stein
There are folks that run Kafka Brokers on Apache Mesos. I don't know of anyone running Kafka brokers on YARN but if there were I would hope they chime in. Without getting into a long debate about Mesos vs YARN I do agree with cluster resource allocation being an important direction for the indust

Re: Kafka on yarn

2014-07-23 Thread Kam Kasravi
Hi  Kafka-on-yarn requires YARN to consistently allocate a kafka broker at a particular resource since the broker needs to always use its local data. YARN doesn't do this well, unless you provide (override) the default scheduler (CapacityScheduler or FairScheduler). SequenceIO did something alo