I've been investigating consumer group rebalances happening when I don't think 
they should and have noticed an issue.  In a nutshell, if a consumer is 
receiving messages in response to every fetch request then it won't run delayed 
tasks, most notably heartbeats and automatic commits, which in turn will cause 
a rebalance.


Note that in the situation I'm describing the consumer is polling regularly, 
well within the session timeout, so a rebalance is not expected.


In KafkaConsumer::pollOnce there is a check for fetched records, and if there 
are records found then it skips running client.poll.  Then up in 
KafkaConsumer::poll if records are returned it initiates fetches and does a 
quick poll, which won't run delayed tasks but will receive fetched records.  So 
if the fetch responses are coming in during every quick poll the consumer gets 
in a state where it's never calling client.poll and running delayed tasks 
(until it stops receiving records in response to its fetches).


I can provide detailed reproduction steps if needs be.  The key parameters are 
that there must be at least 2 brokers involved, and the max fetch size should 
be reduced, to limit the size of the fetch batches.


If anyone can verify what I'm seeing I'll create a bug, and if anyone has any 
ideas on how to prevent this from happening I would appreciate them.

Reply via email to