[ https://issues.apache.org/jira/browse/KAFKA-6399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matthias J. Sax updated KAFKA-6399: ----------------------------------- Description: In Kafka {{0.10.2.1}} we change the default value of {{max.poll.intervall.ms}} for Kafka Streams to {{Integer.MAX_VALUE}}. The reason was that long state restore phases during rebalance could yield "rebalance storms" as consumers drop out of a consumer group even if they are healthy as they didn't call {{poll()}} during state restore phase. In version {{0.11}} and {{1.0}} the state restore logic was improved a lot and thus, now Kafka Streams does call {{poll()}} even during restore phase. Therefore, we might consider setting a smaller timeout for {{max.poll.intervall.ms}} to detect bad behaving Kafka Streams applications (ie, targeting user code) that don't make progress any more during regular operations. The open question would be, what a good default might be. Maybe the actual consumer default of 30 seconds might be sufficient. During one {{poll()}} roundtrip, we would only call {{restoreConsumer.poll()}} once and restore a single batch of records. This should take way less time than 30 seconds. KIP-442: [https://cwiki.apache.org/confluence/display/KAFKA/KIP-442%3A+Return+to+default+max+poll+interval+in+Streams] was: In Kafka {{0.10.2.1}} we change the default value of {{max.poll.intervall.ms}} for Kafka Streams to {{Integer.MAX_VALUE}}. The reason was that long state restore phases during rebalance could yield "rebalance storms" as consumers drop out of a consumer group even if they are healthy as they didn't call {{poll()}} during state restore phase. In version {{0.11}} and {{1.0}} the state restore logic was improved a lot and thus, now Kafka Streams does call {{poll()}} even during restore phase. Therefore, we might consider setting a smaller timeout for {{max.poll.intervall.ms}} to detect bad behaving Kafka Streams applications (ie, targeting user code) that don't make progress any more during regular operations. The open question would be, what a good default might be. Maybe the actual consumer default of 30 seconds might be sufficient. During one {{poll()}} roundtrip, we would only call {{restoreConsumer.poll()}} once and restore a single batch of records. This should take way less time than 30 seconds. > Consider reducing "max.poll.interval.ms" default for Kafka Streams > ------------------------------------------------------------------ > > Key: KAFKA-6399 > URL: https://issues.apache.org/jira/browse/KAFKA-6399 > Project: Kafka > Issue Type: Improvement > Components: streams > Affects Versions: 1.0.0 > Reporter: Matthias J. Sax > Assignee: John Roesler > Priority: Minor > Labels: kip > Fix For: 2.3.0 > > > In Kafka {{0.10.2.1}} we change the default value of > {{max.poll.intervall.ms}} for Kafka Streams to {{Integer.MAX_VALUE}}. The > reason was that long state restore phases during rebalance could yield > "rebalance storms" as consumers drop out of a consumer group even if they are > healthy as they didn't call {{poll()}} during state restore phase. > In version {{0.11}} and {{1.0}} the state restore logic was improved a lot > and thus, now Kafka Streams does call {{poll()}} even during restore phase. > Therefore, we might consider setting a smaller timeout for > {{max.poll.intervall.ms}} to detect bad behaving Kafka Streams applications > (ie, targeting user code) that don't make progress any more during regular > operations. > The open question would be, what a good default might be. Maybe the actual > consumer default of 30 seconds might be sufficient. During one {{poll()}} > roundtrip, we would only call {{restoreConsumer.poll()}} once and restore a > single batch of records. This should take way less time than 30 seconds. > KIP-442: > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-442%3A+Return+to+default+max+poll+interval+in+Streams] -- This message was sent by Atlassian JIRA (v7.6.3#76005)