Hi all

Have a question about the scenario where consumer that is consuming Kafka 
records is not very fast (regardless of the reason). And yes I know about 
certain configuration properties on both server and consumer which help with 
mitigating the effects, so I just simply want to confirm that what I am seeing 
is the expected behavior, so here it is

- Kafka topic with a single partition. 
- 3 consumers which essentially means that one will be consuming records while 
two others will essentially be on stand-by
- First consumer doesn’t manage to process all returned records within 
‘session.timeout.ms” and broker chooses different consumer (rebalancing). While 
fist consumer is unaware (until next poll() or commit() call) that it’s been 
blacklisted and continues processing the remaining records the 
consumer.commitSync() begins to fail (rightfully so) and first consumer is now 
out of the picture. . .
- Second consumer starts processing records and the process resumes

However, the second consumer manages to grab a few records that have actually 
been processed successfully by the first consumer (i.e., commitSync() was 
executed successfully). So let’s say first consumer processed 0, 1, 2, 3, 4, 5 
and second consumer starts with 4, 5, 6, 7, 8. . . so 4 and 5 becomes 
duplicates.

I am suspecting that there is synchronization gap where during consumer 
rebalancing some offsets that were “just committed” by the black-listed 
consumer are not known to the next consumer chosen by the broker, hence 
allowing it to read those records and produce duplicates.

Is my assumption correct? I mean I can reproduce it in so many different ways 
and I can also fix it in so many different ways, so its not a huge problem, but 
I am just trying to understand exactly what’s happening and if this is the 
expected behavior. After all Kafka does guarantee “at-least-one”, but not 
“exactly-one”, which would make sense. Duplicates are always better then 
data-loss.

Cheers
Oleg

Reply via email to