Hi Gwen
Your recommendations in the field to partition off non-cluster nodes and
reserve them for kafka brokers totally make sense given current YARN
limitations. I'm familiar with the llama hacks - effectively reserving
containers with dummy processes that just sit there and then running the '
Jay - good points on rolling broker upgrades/config changes and the challenges
of having an app master coordinate this type of thing. Not really specific to
kafka but something you would hope an app master managing these types of
services would take care off. I also thought an app master should
Steve - yes I have been monitoring YARN advances in this area particularly
YARN-1051 which seems to have most of what long running services with hard node
locality requirements need and is based on MS's Rayon framework
(https://issues.apache.org/jira/secure/attachment/12628143/curino_MSR-TR-2013
Hi,
Can we discuss for a moment the use-case of Kafka-on-YARN?
I (as Cloudera field engineer) typically advise my customers to
install Kafka on their own nodes, to allow Kafka uninterrupted access
to disks. Hadoop processes tend to be a bit IO heavy. Also, I can't
see any benefit from co-locating
Thanks guys for your knowledge. Is there any other concern on
producer/consumer side? My understanding is High level consumer and
producer would refresh metadata of the cluster and detect the leadership
change or node failure. I guess, there shouldn't be anything worried if I
delete 1 broker and a
Yeah restoring data is definitely expensive. If you have 5TB/machine
then you will need to restore 5TB of data. Running this way then there
is no particular functionality you need out of the app master other
than and setting the right node id.
Obviously you do need HA RM to make this work. I think
Kam,
Give it some time and think it's getting better as a real possibility for
Kafka on Yarn. There are new capabilities coming out in Yarn/HDFS to allow
for node groups/label that can work with locality and secondarily new
functionality in HDFS that depending on the use-case can be very
interes
Thanks Joe for the input related to Mesos as well as acknowledging the need for
YARN to support this type of cluster allocation - long running services with
node locality priority.
Thanks Jay - That's an interesting fact that I wasn't aware of - though I
imagine there could possibly be a long
Hey Kam,
It would be nice to have a way to get a failed node back with it's
original data, but this isn't strictly necessary, it is just a good
optimization. As long as you run with replication you can restart a
broker elsewhere with no data, and it will restore it's state off the
other replicas.
There are folks that run Kafka Brokers on Apache Mesos. I don't know of
anyone running Kafka brokers on YARN but if there were I would hope they
chime in.
Without getting into a long debate about Mesos vs YARN I do agree with
cluster resource allocation being an important direction for the indust
Hi
Kafka-on-yarn requires YARN to consistently allocate a kafka broker at a
particular resource since the broker needs to always use its local data. YARN
doesn't do this well, unless you provide (override) the default scheduler
(CapacityScheduler or FairScheduler). SequenceIO did something alo
11 matches
Mail list logo