Re: "at least once" consumer recommendations for a load of 5 K messages/second

2015-02-26 Thread Anand Somani
Thanks a bunch for the detailed response and tips!! Looks like I have a couple of knobs one of which should work, I will be doing some runs to figure out what works best for my use case. Thanks again. On Thu, Feb 26, 2015 at 9:03 AM, Jeff Wartes wrote: > > A note on throughput with an at-least-

Re: "at least once" consumer recommendations for a load of 5 K messages/second

2015-02-26 Thread Jeff Wartes
A note on throughput with an at-least-once guarantee using the high-level consumer: The core unit of concurrency in kafka is the partition, because you can't have more clients than partitions. Although you can ask for two messages from a given client instance and process those in parallel, the c

Re: "at least once" consumer recommendations for a load of 5 K messages/second

2015-02-25 Thread Gwen Shapira
I don't have good numbers, but I noticed that I usually scale number of partitions by the consumer rates and not by producer rate. Writing to HDFS can be a bit slow (30MB/s is pretty typical, IIRC), so if I need to write 5G a second, I need at least 15 consumers, which means at least 15 partitions

Re: "at least once" consumer recommendations for a load of 5 K messages/second

2015-02-25 Thread Anand Somani
Sweet! that I would not depend on ZK more consumption anymore. Thanks for the response Gwen, I will take a look at the link you have provided. >From what I have read so far, for my scenario to work correctly I would have multiple partitions and a consumer per partition, is that correct? So for me

Re: "at least once" consumer recommendations for a load of 5 K messages/second

2015-02-24 Thread Gwen Shapira
* ZK was not built for 5K/s writes type of load * Kafka 0.8.2.0 allows you to commit messages to Kafka rather than ZK. I believe this is recommended. * You can also commit batches of messages (i.e. commit every 100 messages). This will reduce the writes and give you at least once while controlling