Sweet! that I would not depend on ZK more consumption anymore. Thanks for
the response Gwen, I will take a look at the link you have provided.

>From what I have read so far, for my scenario to work correctly I would
have multiple partitions and a consumer per partition, is that correct? So
for me to be able to improve throughput on the consumer, will need to play
with the number of partitions. Is there any recommendation on that ratio
partition/topic or that can be scaled up/out with powerful/more hardware?

Thanks
Anand

On Tue, Feb 24, 2015 at 8:11 PM, Gwen Shapira <gshap...@cloudera.com> wrote:

> * ZK was not built for 5K/s writes type of load
> * Kafka 0.8.2.0 allows you to commit messages to Kafka rather than ZK. I
> believe this is recommended.
> * You can also commit batches of messages (i.e. commit every 100 messages).
> This will reduce the writes and give you at least once while controlling
> number of duplicates in case of failure.
> * Yes, can be done in high level consumer. I give few tips here:
>
> http://ingest.tips/2014/10/12/kafka-high-level-consumer-frequently-missing-pieces/
>
> Gwen
>
> On Tue, Feb 24, 2015 at 1:57 PM, Anand Somani <meatfor...@gmail.com>
> wrote:
>
> > Hi,
> >
> > It is a little long, since I wanted to explain the use case and then ask
> > questions, so thanks for your attention
> >
> > Use case:
> >
> > We have a use case where everything in the queue has to be consumed at
> > least once. So the consumer has to have "consumed" (saved in some
> > destination database) the message before confirming consumption to kafka
> > (or ZK). Now it is possible and from what I have read so far we will have
> > consumer groups and partitions. Here are some facts/numbers for our case
> >
> > * We will potentially have messages with peaks of 5k /second.
> > * We can play with the message size if that makes any difference (keep
> it <
> > 100 bytes for a link or put the entire message avg size of 2-5K bytes).
> > * We do not need replication, but might have a kafka cluster to handle
> the
> > load.
> > * Also work consumption will take anywhere from 300-500ms, generally we
> > would like the consumer to be not behind by more than 1-2 minutes. So if
> > the message shows up in a queue, it should show up in the database
> within 2
> > minutes.
> >
> > The questions I have are
> >   * If this has been covered before, please point me to it. Thanks
> >   * Is that possible/recommended "controlled commit per consumed message"
> > for this load (have read about some concerns on ZK issues)?
> >   * Are there any recommendations on configurations in terms of
> partitions
> > to number of messages OR consumers? Maybe more queues/topics
> >   * Anything else that we might need to watch out for?
> >   * As for the client, I should be able to do this (control when the
> offset
> > commit happens) with high level consumer I suppose?
> >
> >
> > Thanks
> > Anand
> >
>

Reply via email to