Thanks Guozhang,

I was looking for actual real world workflows. I realize you can commit
after each message but if you’re using ZK for offsets for instance you’ll
put too much write load on the nodes and crush your throughput. So I was
interested in batching strategies people have used that balance high/full
throughput and fully committed events.


On Thu, Jul 31, 2014 at 8:16 AM, Guozhang Wang <wangg...@gmail.com> wrote:

> Hi Jim,
>
> Whether to use high level or simple consumer depends on your use case. If
> you need to manually manage partition assignments among your consumers, or
> you need to commit your offsets elsewhere than ZK, or you do not want auto
> rebalancing of consumers upon failures etc, you will use simple consumers;
> otherwise you use high level consumer.
>
> From your description of pulling a batch of messages it seems you are
> currently using the simple consumer. Suppose you are using the high level
> consumer, to achieve at-lease-once basically you can do sth like:
>
> message = consumer.iter.next()
> process(message)
> consumer.commit()
>
> which is effectively the same as option 2 for using a simple consumer. Of
> course, doing so has a heavy overhead of one-commit-per-message, you can
> also do option 1, by the cost of duplicates, which is tolerable for
> at-least-once.
>
> Guozhang
>
>
> On Wed, Jul 30, 2014 at 8:25 PM, Jim <jimi...@gmail.com> wrote:
>
> > Curious on a couple questions...
> >
> > Are most people(are you?) using the simple consumer vs the high level
> > consumer in production?
> >
> >
> > What is the common processing paradigm for maintaining a full pipeline
> for
> > kafka consumers for at-least-once messaging? E.g. you pull a batch of
> 1000
> > messages and:
> >
> > option 1.
> > you wait for the slowest worker to finish working on that message, when
> you
> > get back 1000 acks internally you commit your offset and pull another
> batch
> >
> > option 2.
> > you feed your workers n msgs at a time in sequence and move your offset
> up
> > as you work through your batch
> >
> > option 3.
> > you maintain a full stream of 1000 messages ideally and as you get acks
> > back from your workers you see if you can move your offset up in the
> stream
> > to pull n more messages to fill up your pipeline so you're not blocked by
> > the slowest consumer (probability wise)
> >
> >
> > any good docs or articles on the subject would be great, thanks!
> >
>
>
>
> --
> -- Guozhang
>

Reply via email to