Hi Matthias,

Thanks for your answer. It makes a lot of sense.

Just a follow-up question. KIP-62 says: "we give the client as much as 
max.poll.interval.ms to handle a batch of records, this is also the maximum 
time before a consumer can be expected to rejoin the group in the worst case". 
Does it mean that a broker would wait Integer.MAX_VALUE for a client to report 
in the event of a rebalance? That sounds improbable, so I must be missing 
something.

Thanks.


-----Original Message-----
From: Matthias J. Sax [mailto:matth...@confluent.io] 
Sent: Friday, December 22, 2017 9:13 PM
To: users@kafka.apache.org
Subject: Re: Kafka Streams - max.poll.interval.ms defaults to Integer.MAX_VALUE

The value was change to make Streams application robust against large state 
restore phases during rebalance.

Ie, it is targeted to exactly "fix" 2. If an application needs to restore 
state, this state restore might take longer than the max.poll.interval.ms 
parameter and thus, even if the application is in a good state it drops out of 
the group. This results in rebalance "storms". The consumer default of 30 
seconds is too small for most applications and thus we set it to MAX_VALUE -- 
if you have a good estimate on the max expected state restore time, you can 
safely set the timeout to an appropriate value.

Note, in Kafka 0.11 and 1.0 Kafka Streams state restore was largely improved 
and it should not be an issue there to reduce the timeout accordingly.


-Matthias

On 12/20/17 7:14 AM, Javier Holguera wrote:
> Hi,
> 
> According to the documentation, "max.poll.interval.ms" defaults to 
> Integer.MAX_VALUE for Kafka Streams since 0.10.2.1.
> 
> Considering that the "max.poll.interval.ms" is:
> 
>   1.  A "processing timeout" to control an upper limit for processing a batch 
> of records AND
>   2.  The rebalance timeout that the client will communicate to the 
> broker, according to KIP-62
> 
> How do Kafka Streams application detect slow consumers that are taking too 
> long to process a batch of messages? What replaces the existing mechanism 
> with a smaller "max.poll.interval.ms" where the application will willingly 
> abandon the consumer group when the timeout expires?
> 
> From the broker perspective, what does it mean that the application 
> communicates a "rebalance timeout" of Integer.MAX_VALUE? I can imagine it 
> will not wait for that long in a rebalance. What happens then?
> 
> Thanks.
> 

Reply via email to