Thanks for all the responses. Unfortunately it seems that currently there is no 
fool proof solution to this. It's not a problem with the stored offsets as it 
will happen even if I do a commitSync after each record is processed. It's the 
unprocessed records in the batch that get processed twice.

I'm now taking the approach of trying to limit the possibility of a rebalance 
as much as possible by reducing the data returned by poll. I'm also using the 
pause, poll, resume pattern to ensure the consumer doesn't cause a rebalance if 
the processing loop takes longer than session.timeout.ms.

Cheers,
Phil



On 21 Apr 2016 16:24, at 16:24, vinay sharma <vinsharma.t...@gmail.com> wrote:
>Hi,
>
>By design Kafka does ensure not to send same record to multiple
>consumers
>in same consumer group. Issue is because of rebalance while a
>processing is
>going on and records are not yet commited. In my view there are only 2
>possible solutions to it
>1) As mentioned in documentation, store offsets outside of kafka (
>https://kafka.apache.org/090/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html).
>This is a complete solution but will definitely add extra developement
>and
>also extra processing to each message. Problem may still exist if at
>the
>time of a crash consumer was out of sync from external custom offset
>storage and offsets stored in kafka both.
>2)  As mentioned in fix for defect 919 (
>https://issues.apache.org/jira/browse/KAFKA-919) set autocommit to
>true.
>This will make kafka commit fetched records before rebalancing. Only
>drawback is that some records may never be processed if consumer
>crashes
>while processing records which are already marked committed due to
>rebalance.
>
>Regards,
>Vinay Sharma

Reply via email to