Hi Gwen

Your recommendations in the field to partition off non-cluster nodes and 
reserve them for kafka brokers totally make sense given current YARN 
limitations. I'm familiar with the llama hacks - effectively reserving 
containers with dummy processes that just sit there and then running the 'real' 
processes is a hack fest there is no doubt. YARN coupling the container 
lifecycle with the process lifecycle was an early basic design decision that is 
hard to change at this stage. On the other hand, I do think HDFS colocation is 
required if the app master provides an installation option. Per Jay's point - 
you may want to distribute config changes and/or version upgrades to brokers 
via HDFS. Regarding YARN IO, YARN-1711 at least is headed this way by virtue of 
quotas. I do hope YARN can eventually manage long running services effectively. 
I think it's no coincidence that as YARN evolves the difference between YARN 
and a cluster manager like ambari shrink.


On Wednesday, July 23, 2014 6:41 PM, Gwen Shapira <gshap...@cloudera.com> wrote:
 


Hi,

Can we discuss for a moment the use-case of Kafka-on-YARN?

I (as Cloudera field engineer) typically advise my customers to
install Kafka on their own nodes, to allow Kafka uninterrupted access
to disks. Hadoop processes tend to be a bit IO heavy. Also, I can't
see any benefit from co-locating Kafka and HDFS.

Since YARN does not manage IO yet, running Kafka on Hadoop cluster
with YARN won't solve this problem in the near future.

The other problem is that we typically want brokers to be long
running, and YARN is poorly designed for that (see our hacks for Llama
as an example).

And yet another problem: For resource management to work, we need to
be able to add and take away resources from a process. AFAIK, the YARN
re-allocated memory for Java processes is to kill them (since there's
no good way to force Java to give back memory to the OS). I doubt we
want to do that for Kafka.

I'd love to hear from those interested in Kafka+YARN what do they
expect to gain out of the combination.

Gwen



On Wed, Jul 23, 2014 at 2:37 PM, hsy...@gmail.com <hsy...@gmail.com> wrote:
> Hi guys,
>
> Kafka is getting more and more popular and in most cases people run kafka
> as long-term service in the cluster. Is there a discussion of running kafka
> on yarn cluster which we can utilize the convenient configuration/resource
> management and HA.  I think there is a big potential and requirement for
> that.
> I found a project https://github.com/kkasravi/kafka-yarn. But is there a
> official roadmap/plan for this?
>
> Thank you very much!
>
> Best,
> Siyuan

Reply via email to