Thanks a bunch for the detailed response and tips!! Looks like I have a
couple of knobs one of which should work, I will be doing some runs to
figure out what works best for my use case.
Thanks again.
On Thu, Feb 26, 2015 at 9:03 AM, Jeff Wartes wrote:
>
> A note on throughput with an at-least-
A note on throughput with an at-least-once guarantee using the high-level
consumer:
The core unit of concurrency in kafka is the partition, because you can't
have more clients than partitions. Although you can ask for two messages
from a given client instance and process those in parallel, the c
I don't have good numbers, but I noticed that I usually scale number of
partitions by the consumer rates and not by producer rate.
Writing to HDFS can be a bit slow (30MB/s is pretty typical, IIRC), so if I
need to write 5G a second, I need at least 15 consumers, which means at
least 15 partitions
Sweet! that I would not depend on ZK more consumption anymore. Thanks for
the response Gwen, I will take a look at the link you have provided.
>From what I have read so far, for my scenario to work correctly I would
have multiple partitions and a consumer per partition, is that correct? So
for me
* ZK was not built for 5K/s writes type of load
* Kafka 0.8.2.0 allows you to commit messages to Kafka rather than ZK. I
believe this is recommended.
* You can also commit batches of messages (i.e. commit every 100 messages).
This will reduce the writes and give you at least once while controlling