Hi Jim,

Whether to use high level or simple consumer depends on your use case. If
you need to manually manage partition assignments among your consumers, or
you need to commit your offsets elsewhere than ZK, or you do not want auto
rebalancing of consumers upon failures etc, you will use simple consumers;
otherwise you use high level consumer.

>From your description of pulling a batch of messages it seems you are
currently using the simple consumer. Suppose you are using the high level
consumer, to achieve at-lease-once basically you can do sth like:

message = consumer.iter.next()
process(message)
consumer.commit()

which is effectively the same as option 2 for using a simple consumer. Of
course, doing so has a heavy overhead of one-commit-per-message, you can
also do option 1, by the cost of duplicates, which is tolerable for
at-least-once.

Guozhang


On Wed, Jul 30, 2014 at 8:25 PM, Jim <jimi...@gmail.com> wrote:

> Curious on a couple questions...
>
> Are most people(are you?) using the simple consumer vs the high level
> consumer in production?
>
>
> What is the common processing paradigm for maintaining a full pipeline for
> kafka consumers for at-least-once messaging? E.g. you pull a batch of 1000
> messages and:
>
> option 1.
> you wait for the slowest worker to finish working on that message, when you
> get back 1000 acks internally you commit your offset and pull another batch
>
> option 2.
> you feed your workers n msgs at a time in sequence and move your offset up
> as you work through your batch
>
> option 3.
> you maintain a full stream of 1000 messages ideally and as you get acks
> back from your workers you see if you can move your offset up in the stream
> to pull n more messages to fill up your pipeline so you're not blocked by
> the slowest consumer (probability wise)
>
>
> any good docs or articles on the subject would be great, thanks!
>



-- 
-- Guozhang

Reply via email to