[ 
https://issues.apache.org/jira/browse/KAFKA-9232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002547#comment-17002547
 ] 

ASF GitHub Bot commented on KAFKA-9232:
---------------------------------------

hachikuji commented on pull request #7753: KAFKA-9232: Coordinator new member 
heartbeat completion does not work for JoinGroup v3
URL: https://github.com/apache/kafka/pull/7753
 
 
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Coordinator new member heartbeat completion does not work for JoinGroup v3
> --------------------------------------------------------------------------
>
>                 Key: KAFKA-9232
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9232
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 2.1.1, 2.2.2, 2.3.1
>            Reporter: Jason Gustafson
>            Assignee: Sophie Blee-Goldman
>            Priority: Major
>
> For older versions of the JoinGroup API, the coordinator implements a static 
> timeout for new members of 5 minutes. This timeout is implemented using the 
> heartbeat purgatory and we expect that the delayed operation will be force 
> completed if the member successfully joins. This is implemented in 
> GroupCoordinator with the following logic:
> {code:scala}
>             group.maybeInvokeJoinCallback(member, joinResult)
>             completeAndScheduleNextHeartbeatExpiration(group, member)
>             member.isNew = false
> {code}
> However, heartbeat completion depends on this check:
> {code:scala}
>   def shouldKeepAlive(deadlineMs: Long): Boolean = {
>     if (isAwaitingJoin)
>       !isNew || latestHeartbeat + GroupCoordinator.NewMemberJoinTimeoutMs > 
> deadlineMs
>     else awaitingSyncCallback != null ||
>       latestHeartbeat + sessionTimeoutMs > deadlineMs
>   }
> {code}
> Since we invoke the join callback first, we will fall to the second branch. 
> This will only return true when the latest heartbeat plus session timeout 
> exceeds the deadline. The deadline in this case depends only on the 
> statically configured new member timeout, which means the heartbeat cannot 
> complete until about 5 minutes have passed. If the member falls out of the 
> group before then, then the heartbeat ultimately expires, which may trigger a 
> spurious rebalance.
> Newer versions of the protocol are not affected by this bug because we return 
> immediately the first time a member joins the group.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to