[ 
https://issues.apache.org/jira/browse/KAFKA-6399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-6399:
-----------------------------------
    Description: 
In Kafka {{0.10.2.1}} we change the default value of {{max.poll.intervall.ms}} 
for Kafka Streams to {{Integer.MAX_VALUE}}. The reason was that long state 
restore phases during rebalance could yield "rebalance storms" as consumers 
drop out of a consumer group even if they are healthy as they didn't call 
{{poll()}} during state restore phase.

In version {{0.11}} and {{1.0}} the state restore logic was improved a lot and 
thus, now Kafka Streams does call {{poll()}} even during restore phase. 
Therefore, we might consider setting a smaller timeout for 
{{max.poll.intervall.ms}} to detect bad behaving Kafka Streams applications 
(ie, targeting user code) that don't make progress any more during regular 
operations.

The open question would be, what a good default might be. Maybe the actual 
consumer default of 30 seconds might be sufficient. During one {{poll()}} 
roundtrip, we would only call {{restoreConsumer.poll()}} once and restore a 
single batch of records. This should take way less time than 30 seconds.

KIP-442: 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-442%3A+Return+to+default+max+poll+interval+in+Streams]

  was:
In Kafka {{0.10.2.1}} we change the default value of {{max.poll.intervall.ms}} 
for Kafka Streams to {{Integer.MAX_VALUE}}. The reason was that long state 
restore phases during rebalance could yield "rebalance storms" as consumers 
drop out of a consumer group even if they are healthy as they didn't call 
{{poll()}} during state restore phase.

In version {{0.11}} and {{1.0}} the state restore logic was improved a lot and 
thus, now Kafka Streams does call {{poll()}} even during restore phase. 
Therefore, we might consider setting a smaller timeout for 
{{max.poll.intervall.ms}} to detect bad behaving Kafka Streams applications 
(ie, targeting user code) that don't make progress any more during regular 
operations.

The open question would be, what a good default might be. Maybe the actual 
consumer default of 30 seconds might be sufficient. During one {{poll()}} 
roundtrip, we would only call {{restoreConsumer.poll()}} once and restore a 
single batch of records. This should take way less time than 30 seconds.


> Consider reducing "max.poll.interval.ms" default for Kafka Streams
> ------------------------------------------------------------------
>
>                 Key: KAFKA-6399
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6399
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>    Affects Versions: 1.0.0
>            Reporter: Matthias J. Sax
>            Assignee: John Roesler
>            Priority: Minor
>              Labels: kip
>             Fix For: 2.3.0
>
>
> In Kafka {{0.10.2.1}} we change the default value of 
> {{max.poll.intervall.ms}} for Kafka Streams to {{Integer.MAX_VALUE}}. The 
> reason was that long state restore phases during rebalance could yield 
> "rebalance storms" as consumers drop out of a consumer group even if they are 
> healthy as they didn't call {{poll()}} during state restore phase.
> In version {{0.11}} and {{1.0}} the state restore logic was improved a lot 
> and thus, now Kafka Streams does call {{poll()}} even during restore phase. 
> Therefore, we might consider setting a smaller timeout for 
> {{max.poll.intervall.ms}} to detect bad behaving Kafka Streams applications 
> (ie, targeting user code) that don't make progress any more during regular 
> operations.
> The open question would be, what a good default might be. Maybe the actual 
> consumer default of 30 seconds might be sufficient. During one {{poll()}} 
> roundtrip, we would only call {{restoreConsumer.poll()}} once and restore a 
> single batch of records. This should take way less time than 30 seconds.
> KIP-442: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-442%3A+Return+to+default+max+poll+interval+in+Streams]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to