Hi Kyle, Thanks for the update. Wondering if you found answer to your N-1 commit question? If auto commit happens only at iterator.next () and onky for the N -1 message then client code can be much simpler and reliable as you mentioned. I'm also looking forward to any post in this regard.
Jagbir On June 18, 2014 3:17:25 PM PDT, Kyle Banker <kyleban...@gmail.com> wrote: >I think I've discovered the answer to my second question: according to >the >code in ZookeeperConsumerConnector.scala, a rebalance derives its >offsets >from what's already in Zookeeper. Therefore, uncommitted but consumed >messages from a given partition will be replayed when the partition is >reassigned. > > >On Fri, Jun 13, 2014 at 3:01 PM, Kyle Banker <kyleban...@gmail.com> >wrote: > >> I'm using Kafka 0.8.1.1. >> >> I have a simple goal: use the high-level consumer to consume a >message >> from Kafka, publish the message to a different system, and then >commit the >> message in Kafka. Based on my reading of the docs and the mailing >list, it >> seems like this isn't so easy to achieve. Here is my current >understanding: >> >> First, I have to disable auto-commit. If the consumer automatically >> commits, then I may lose messages if, for example, my process dies >after >> consuming but before publishing my message. >> >> Next, if my app is multi-threaded, I need to either >> >> a) use a separate consumer per thread (memory-intensive, hard on >> Zookeeper) or >> b) use a single consumer and assign a KafkaStream to each thread. >Then, >> when I want to commit, first synchronize all threads using a barrier. >> >> First question: is this correct so far? >> >> >> Still, it appears that rebalancing may be a problem. In particular, >this >> sequence of events: >> >> 1. I'm consuming from a stream tied to two partitions, A and B. >> 2. I consume a message, M, from partition A. >> 3. Partition A gets assigned to a different consumer. >> 4. I choose not to commit M or my process fails. >> >> Second question: When the partition is reassigned, will the message >that I >> consumed be automatically committed? If so, then there's no way to >get the >> reliability I want. >> >> >> Third question: How do the folks at LinkedIn handle this overall use >case? >> What about other users? >> >> It seems to me that a lot of the complexity here could be easily >addressed >> by changing the way in which a partition's message pointer is >advanced. >> That is, when I consume message M, advance the pointer to message (M >- 1) >> rather than to M. In other words, calling iterator.next() would imply >that >> the previously consumed message may be safely committed. If this were >the >> case, I could simply enable auto-commit and be happy. >> -- Sent from my Android phone with K-9 Mail. Please excuse my brevity.