Jun, Yes, sorry, I think that was the basis for my question. When auto commit is enabled, special care is taken to make sure things are auto-committed during a rebalance. This is needed because when a topic moves off of a consumer thread (since it is being rebalanced to another one), it's as if that topic is being shutdown on that connector, and any not-yet-committed messages need to be committed before letting go of the topic.
So, my question is around trying to understand if there's a way I can reproduce similar functionality using my own sync auto commit implementation (and I'm not sure there is). It seems that when there's a rebalance, all processed but not-yet-committed offsets will not be committed, and thus there will be no way to prevent pretty massive duplicate consumption on a rebalance. Is that about right? Or is there someway around this that I'm not seeing? The auto-commit functionality that's builtin is so close to being all that anyone would need, except it has a glaring weakness, in that it will cause messages to be lost from time to time, and so I don't know that it will meet the needs of trying to have reliable delivery (with duplicates ok). Jason On Tue, Oct 15, 2013 at 9:00 PM, Jun Rao <jun...@gmail.com> wrote: > If auto commit is disabled, the consumer connector won't call commitOffsets > during rebalancing. > > Thanks, > > Jun > > > On Tue, Oct 15, 2013 at 4:16 PM, Jason Rosenberg <j...@squareup.com> wrote: > > > I'm looking at implementing a synchronous auto offset commit solution. > > People have discussed the need for this in previous > > threads......Basically, in my consumer loop, I want to make sure a > message > > has been actually processed before allowing it's offset to be committed. > > But I don't want to commit on every message, since that would be too > > expensive. So, I want to use the 'auto.commit.interval.ms' to > > periodically > > call commitOffsets, but only after a message is processed, but not after > > the next message has been issued via a call to 'next()' on the > > ConsumerIterator. > > > > The builtin 'auto.commit.enable' feature unfortunately allows commits to > > happen on any message that has been returned via ConsumerIterator.next(). > > But if the consumer goes down before actually processing the message, or > > if it hangs indefinitely for some reason, then this message will get > > committed before it has actually been consumed successfully. > > > > I think there are issues with trying to implement this on top of the > > high-level consumer api. First, I need to worry about multiple threads > > consuming in the same connector (so for now I'm limiting this to support > > only 1 thread). > > > > Also, when shutting down the connector, I need to make sure any pending > > messages are committed before allowing the connector to shutdown. So, > that > > seems easy enough to handle. > > > > One thing I'm more concerned with, is what happens when there's a > consumer > > rebalance. Looking at the ZookeeperConsumerConnector code, it seems > there > > are explicit calls to commitOffsets during the rebalance. I'm not sure > how > > to handle that from the high-level api (and do I need to worry about > > that?). > > > > Thanks for any insight. > > > > Jason > > >