Hi,

Can we discuss for a moment the use-case of Kafka-on-YARN?

I (as Cloudera field engineer) typically advise my customers to
install Kafka on their own nodes, to allow Kafka uninterrupted access
to disks. Hadoop processes tend to be a bit IO heavy. Also, I can't
see any benefit from co-locating Kafka and HDFS.

Since YARN does not manage IO yet, running Kafka on Hadoop cluster
with YARN won't solve this problem in the near future.

The other problem is that we typically want brokers to be long
running, and YARN is poorly designed for that (see our hacks for Llama
as an example).

And yet another problem: For resource management to work, we need to
be able to add and take away resources from a process. AFAIK, the YARN
re-allocated memory for Java processes is to kill them (since there's
no good way to force Java to give back memory to the OS). I doubt we
want to do that for Kafka.

I'd love to hear from those interested in Kafka+YARN what do they
expect to gain out of the combination.

Gwen


On Wed, Jul 23, 2014 at 2:37 PM, hsy...@gmail.com <hsy...@gmail.com> wrote:
> Hi guys,
>
> Kafka is getting more and more popular and in most cases people run kafka
> as long-term service in the cluster. Is there a discussion of running kafka
> on yarn cluster which we can utilize the convenient configuration/resource
> management and HA.  I think there is a big potential and requirement for
> that.
> I found a project https://github.com/kkasravi/kafka-yarn. But is there a
> official roadmap/plan for this?
>
> Thank you very much!
>
> Best,
> Siyuan

Reply via email to