[ https://issues.apache.org/jira/browse/KAFKA-9232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jason Gustafson resolved KAFKA-9232. ------------------------------------ Fix Version/s: 2.4.1 2.3.2 2.2.3 2.1.2 Resolution: Fixed > Coordinator new member heartbeat completion does not work for JoinGroup v3 > -------------------------------------------------------------------------- > > Key: KAFKA-9232 > URL: https://issues.apache.org/jira/browse/KAFKA-9232 > Project: Kafka > Issue Type: Bug > Affects Versions: 2.1.1, 2.2.2, 2.3.1 > Reporter: Jason Gustafson > Assignee: Sophie Blee-Goldman > Priority: Major > Fix For: 2.1.2, 2.2.3, 2.3.2, 2.4.1 > > > For older versions of the JoinGroup API, the coordinator implements a static > timeout for new members of 5 minutes. This timeout is implemented using the > heartbeat purgatory and we expect that the delayed operation will be force > completed if the member successfully joins. This is implemented in > GroupCoordinator with the following logic: > {code:scala} > group.maybeInvokeJoinCallback(member, joinResult) > completeAndScheduleNextHeartbeatExpiration(group, member) > member.isNew = false > {code} > However, heartbeat completion depends on this check: > {code:scala} > def shouldKeepAlive(deadlineMs: Long): Boolean = { > if (isAwaitingJoin) > !isNew || latestHeartbeat + GroupCoordinator.NewMemberJoinTimeoutMs > > deadlineMs > else awaitingSyncCallback != null || > latestHeartbeat + sessionTimeoutMs > deadlineMs > } > {code} > Since we invoke the join callback first, we will fall to the second branch. > This will only return true when the latest heartbeat plus session timeout > exceeds the deadline. The deadline in this case depends only on the > statically configured new member timeout, which means the heartbeat cannot > complete until about 5 minutes have passed. If the member falls out of the > group before then, then the heartbeat ultimately expires, which may trigger a > spurious rebalance. > Newer versions of the protocol are not affected by this bug because we return > immediately the first time a member joins the group. -- This message was sent by Atlassian Jira (v8.3.4#803005)