[ 
https://issues.apache.org/jira/browse/KAFKA-9232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-9232.
------------------------------------
    Fix Version/s: 2.4.1
                   2.3.2
                   2.2.3
                   2.1.2
       Resolution: Fixed

> Coordinator new member heartbeat completion does not work for JoinGroup v3
> --------------------------------------------------------------------------
>
>                 Key: KAFKA-9232
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9232
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 2.1.1, 2.2.2, 2.3.1
>            Reporter: Jason Gustafson
>            Assignee: Sophie Blee-Goldman
>            Priority: Major
>             Fix For: 2.1.2, 2.2.3, 2.3.2, 2.4.1
>
>
> For older versions of the JoinGroup API, the coordinator implements a static 
> timeout for new members of 5 minutes. This timeout is implemented using the 
> heartbeat purgatory and we expect that the delayed operation will be force 
> completed if the member successfully joins. This is implemented in 
> GroupCoordinator with the following logic:
> {code:scala}
>             group.maybeInvokeJoinCallback(member, joinResult)
>             completeAndScheduleNextHeartbeatExpiration(group, member)
>             member.isNew = false
> {code}
> However, heartbeat completion depends on this check:
> {code:scala}
>   def shouldKeepAlive(deadlineMs: Long): Boolean = {
>     if (isAwaitingJoin)
>       !isNew || latestHeartbeat + GroupCoordinator.NewMemberJoinTimeoutMs > 
> deadlineMs
>     else awaitingSyncCallback != null ||
>       latestHeartbeat + sessionTimeoutMs > deadlineMs
>   }
> {code}
> Since we invoke the join callback first, we will fall to the second branch. 
> This will only return true when the latest heartbeat plus session timeout 
> exceeds the deadline. The deadline in this case depends only on the 
> statically configured new member timeout, which means the heartbeat cannot 
> complete until about 5 minutes have passed. If the member falls out of the 
> group before then, then the heartbeat ultimately expires, which may trigger a 
> spurious rebalance.
> Newer versions of the protocol are not affected by this bug because we return 
> immediately the first time a member joins the group.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to