[ 
https://issues.apache.org/jira/browse/KAFKA-13766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17601949#comment-17601949
 ] 

Guozhang Wang commented on KAFKA-13766:
---------------------------------------

Inside onCompleteJoin, in the block starting with

{{// trigger the awaiting join group response callback for all the members 
after rebalancing{{

Indicates that once we are in the completing rebalance phase, we’ve re-enabled 
the HB with session timeout. I.e. in that phase we effectively have two timers:

{{completeAndScheduleNextHeartbeatExpiration(group, member)}}
and
{{schedulePendingSync(group)}}
whichever triggers first, we would fail the member and re-trigger the 
rebalance. And since in general session.timeout is smaller than rebalance 
timeout, we would hit the former if there’s a delay on assignment.

> Use `max.poll.interval.ms` as the timeout during complete-rebalance phase
> -------------------------------------------------------------------------
>
>                 Key: KAFKA-13766
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13766
>             Project: Kafka
>          Issue Type: Bug
>          Components: core, group-coordinator
>            Reporter: Guozhang Wang
>            Assignee: David Jacot
>            Priority: Major
>              Labels: new-rebalance-should-fix
>
> The lifetime of a consumer can be categorized in three phases:
> 1) During normal processing, the broker expects a hb request periodically 
> from consumer, and that is timed by the `session.timeout.ms`.
> 2) During the prepare_rebalance, the broker would expect a join-group request 
> to be received within the rebalance.timeout, which is piggy-backed as the 
> `max.poll.interval.ms`.
> 3) During the complete_rebalance, the broker would expect a sync-group 
> request to be received again within the `session.timeout.ms`.
> So during different phases of the life of the consumer, different timeout 
> would be used to bound the timer.
> Nowadays with cooperative rebalance protocol, we can still return records and 
> process them in the middle of a rebalance from {{consumer.poll}}. In that 
> case, for phase 3) we should also use the `max.poll.interval.ms` to bound the 
> timer, which is in practice larger than `session.timeout.ms`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to