[ https://issues.apache.org/jira/browse/KAFKA-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ismael Juma reassigned KAFKA-4237: ---------------------------------- Assignee: Jason Gustafson > Avoid long request timeout for the consumer > ------------------------------------------- > > Key: KAFKA-4237 > URL: https://issues.apache.org/jira/browse/KAFKA-4237 > Project: Kafka > Issue Type: Improvement > Components: consumer > Reporter: Jason Gustafson > Assignee: Jason Gustafson > Priority: Major > > In the consumer rebalance protocol, the JoinGroup can stay in purgatory on > the server for as long as the rebalance timeout. For the Java client, that > means that the request timeout must be at least as large as the rebalance > timeout (which is governed by {{max.poll.interval.ms}} since KIP-62 and > {{session.timeout.ms}} before then). By default, since 0.10.1, this is 5 > minutes plus some change, which makes the clients slow to detect some hard > failures. > To fix this, two options come to mind: > 1. Right now, all request APIs are limited by the same request timeout in > {{NetworkClient}}, but there's not really any reason why this must be so. We > could use a separate timeout for the JoinGroup request (the implementations > of this is straightforward: > https://github.com/confluentinc/kafka/pull/108/files). > 2. Alternatively, we could prevent the server from holding the JoinGroup in > purgatory for such a long time. Instead, it could return early from the > JoinGroup (say before the session timeout has expired) with an error code > (e.g. REBALANCE_IN_PROGRESS), which tells the client that it should just > resend the JoinGroup. -- This message was sent by Atlassian JIRA (v7.6.3#76005)