dajac commented on a change in pull request #10863: URL: https://github.com/apache/kafka/pull/10863#discussion_r651765967
########## File path: core/src/main/scala/kafka/coordinator/group/GroupCoordinator.scala ########## @@ -1450,12 +1457,89 @@ class GroupCoordinator(val brokerId: Int, group.maybeInvokeJoinCallback(member, joinResult) completeAndScheduleNextHeartbeatExpiration(group, member) member.isNew = false + + group.addPendingSyncMember(member.memberId) } + + schedulePendingSync(group) } } } } + private def maybeRemovePendingSyncMember( + group: GroupMetadata, + memberId: String + ): Unit = { + group.removePendingSyncMember(memberId) + maybeCompleteSyncExpiration(group) + } + + private def removeSyncExpiration( + group: GroupMetadata + ): Unit = { + group.clearPendingSyncMembers() + maybeCompleteSyncExpiration(group) + } + + private def maybeCompleteSyncExpiration( + group: GroupMetadata + ): Unit = { + val groupKey = GroupKey(group.groupId) + syncPurgatory.checkAndComplete(groupKey) + } + + private def schedulePendingSync( + group: GroupMetadata + ): Unit = { + val delayedSync = new DelayedSync(this, group, group.rebalanceTimeoutMs) Review comment: That's a good point. I thought about it as well. Given that the goal of this PR is to protect us against misbehaving or buggy clients, I think that it is OK to allow for a full max.poll.interval.ms in between Join and Sync. Practically, I have found this approach a bit harder to test as we end up with both a DelayedJoin and a DelayedSync in parallel and they both rely on the same rebalance timeout. It is a little easier to reason about them when the are disjoint. That being said, I don't feel strong either ways. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org