Hi 

Kafka-on-yarn requires YARN to consistently allocate a kafka broker at a 
particular resource since the broker needs to always use its local data. YARN 
doesn't do this well, unless you provide (override) the default scheduler 
(CapacityScheduler or FairScheduler). SequenceIO did something along these 
lines for a different use case. Unfortunately replacing the scheduler is a 
global operation which would affect all App masters. Additionally one could 
argue that the broker should be run as an OS service and auto restarted on 
failure if necessary. Slider (incubating) did some of this groundwork but YARN 
still has lots of limitations in providing guarantees to consistently allocate 
a container on a particular node especially on appmaster restart (eg 
ResourceManager dies). That said, it might be worthwhile to enumerate all of 
this here with some possible solutions. If there is interest I could certainly 
list the relevant JIRA's along with some additional JIRA's
 required IMO.

Thanks
Kam


On Wednesday, July 23, 2014 2:37 PM, "hsy...@gmail.com" <hsy...@gmail.com> 
wrote:
 


Hi guys,

Kafka is getting more and more popular and in most cases people run kafka
as long-term service in the cluster. Is there a discussion of running kafka
on yarn cluster which we can utilize the convenient configuration/resource
management and HA.  I think there is a big potential and requirement for
that.
I found a project https://github.com/kkasravi/kafka-yarn. But is there a
official roadmap/plan for this?

Thank you very much!

Best,
Siyuan

Reply via email to