Thanks for all the responses. Unfortunately it seems that currently there is no fool proof solution to this. It's not a problem with the stored offsets as it will happen even if I do a commitSync after each record is processed. It's the unprocessed records in the batch that get processed twice.
I'm now taking the approach of trying to limit the possibility of a rebalance as much as possible by reducing the data returned by poll. I'm also using the pause, poll, resume pattern to ensure the consumer doesn't cause a rebalance if the processing loop takes longer than session.timeout.ms. Cheers, Phil On 21 Apr 2016 16:24, at 16:24, vinay sharma <vinsharma.t...@gmail.com> wrote: >Hi, > >By design Kafka does ensure not to send same record to multiple >consumers >in same consumer group. Issue is because of rebalance while a >processing is >going on and records are not yet commited. In my view there are only 2 >possible solutions to it >1) As mentioned in documentation, store offsets outside of kafka ( >https://kafka.apache.org/090/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html). >This is a complete solution but will definitely add extra developement >and >also extra processing to each message. Problem may still exist if at >the >time of a crash consumer was out of sync from external custom offset >storage and offsets stored in kafka both. >2) As mentioned in fix for defect 919 ( >https://issues.apache.org/jira/browse/KAFKA-919) set autocommit to >true. >This will make kafka commit fetched records before rebalancing. Only >drawback is that some records may never be processed if consumer >crashes >while processing records which are already marked committed due to >rebalance. > >Regards, >Vinay Sharma