Neha, I believe option 2 is more like the high level consumer, no? High level consumer doesn't really guarantee processing of the message. E.g. I take 1000 messages feed them off to a bunch of threads and "hope and pray" they get processed. Option 3 is something I haven't seen in the kafka client world yet.
Option 3 would be I pull message set [1,2,3,4,5] Fire those off to 5 workers. Those workers need to ensure successful processing of a message and you can't move your offset up until you know all the offsets you're zooming past have been properly consumed. High level consumer appears to be a hope & pray strategy. On Wed, Aug 13, 2014 at 9:31 AM, Neha Narkhede <neha.narkh...@gmail.com> wrote: > option1 would take a throughput hit as you are trying to commit one message > at a time. Option 2 is pretty widely used at LinkedIn and am pretty sure at > several other places as well. Option 3 is essentially what the high level > consumer does under the covers already. It prefetches data in batches from > the server to provide high throughput. > > > On Wed, Aug 13, 2014 at 2:20 AM, Anand Nalya <anand.na...@gmail.com> > wrote: > > > Hi Jim, > > > > In one of the applications, we implemented option #1: > > > > messageList = getNext(1000) > > process(messageList) > > commit() > > > > In case of failure, this resulted in duplicate processing for at most > 1000 > > records per partition. > > > > Regards, > > Anand > > > > > > On 1 August 2014 20:35, Jim <jimi...@gmail.com> wrote: > > > > > Thanks Guozhang, > > > > > > I was looking for actual real world workflows. I realize you can commit > > > after each message but if you’re using ZK for offsets for instance > you’ll > > > put too much write load on the nodes and crush your throughput. So I > was > > > interested in batching strategies people have used that balance > high/full > > > throughput and fully committed events. > > > > > > > > > On Thu, Jul 31, 2014 at 8:16 AM, Guozhang Wang <wangg...@gmail.com> > > wrote: > > > > > > > Hi Jim, > > > > > > > > Whether to use high level or simple consumer depends on your use > case. > > If > > > > you need to manually manage partition assignments among your > consumers, > > > or > > > > you need to commit your offsets elsewhere than ZK, or you do not want > > > auto > > > > rebalancing of consumers upon failures etc, you will use simple > > > consumers; > > > > otherwise you use high level consumer. > > > > > > > > From your description of pulling a batch of messages it seems you are > > > > currently using the simple consumer. Suppose you are using the high > > level > > > > consumer, to achieve at-lease-once basically you can do sth like: > > > > > > > > message = consumer.iter.next() > > > > process(message) > > > > consumer.commit() > > > > > > > > which is effectively the same as option 2 for using a simple > consumer. > > Of > > > > course, doing so has a heavy overhead of one-commit-per-message, you > > can > > > > also do option 1, by the cost of duplicates, which is tolerable for > > > > at-least-once. > > > > > > > > Guozhang > > > > > > > > > > > > On Wed, Jul 30, 2014 at 8:25 PM, Jim <jimi...@gmail.com> wrote: > > > > > > > > > Curious on a couple questions... > > > > > > > > > > Are most people(are you?) using the simple consumer vs the high > level > > > > > consumer in production? > > > > > > > > > > > > > > > What is the common processing paradigm for maintaining a full > > pipeline > > > > for > > > > > kafka consumers for at-least-once messaging? E.g. you pull a batch > of > > > > 1000 > > > > > messages and: > > > > > > > > > > option 1. > > > > > you wait for the slowest worker to finish working on that message, > > when > > > > you > > > > > get back 1000 acks internally you commit your offset and pull > another > > > > batch > > > > > > > > > > option 2. > > > > > you feed your workers n msgs at a time in sequence and move your > > offset > > > > up > > > > > as you work through your batch > > > > > > > > > > option 3. > > > > > you maintain a full stream of 1000 messages ideally and as you get > > acks > > > > > back from your workers you see if you can move your offset up in > the > > > > stream > > > > > to pull n more messages to fill up your pipeline so you're not > > blocked > > > by > > > > > the slowest consumer (probability wise) > > > > > > > > > > > > > > > any good docs or articles on the subject would be great, thanks! > > > > > > > > > > > > > > > > > > > > > -- > > > > -- Guozhang > > > > > > > > > >