[GitHub] [kafka] dajac commented on a change in pull request #10863: KAFKA-12890; Consumer group stuck in `CompletingRebalance`

GitBox Tue, 15 Jun 2021 06:00:38 -0700


dajac commented on a change in pull request #10863:
URL: https://github.com/apache/kafka/pull/10863#discussion_r651765967




##########
File path: core/src/main/scala/kafka/coordinator/group/GroupCoordinator.scala
##########
@@ -1450,12 +1457,89 @@ class GroupCoordinator(val brokerId: Int,
             group.maybeInvokeJoinCallback(member, joinResult)
             completeAndScheduleNextHeartbeatExpiration(group, member)
             member.isNew = false
+
+            group.addPendingSyncMember(member.memberId)
           }
+
+          schedulePendingSync(group)
         }
       }
     }
   }
 
+  private def maybeRemovePendingSyncMember(
+    group: GroupMetadata,
+    memberId: String
+  ): Unit = {
+    group.removePendingSyncMember(memberId)
+    maybeCompleteSyncExpiration(group)
+  }
+
+  private def removeSyncExpiration(
+    group: GroupMetadata
+  ): Unit = {
+    group.clearPendingSyncMembers()
+    maybeCompleteSyncExpiration(group)
+  }
+
+  private def maybeCompleteSyncExpiration(
+    group: GroupMetadata
+  ): Unit = {
+    val groupKey = GroupKey(group.groupId)
+    syncPurgatory.checkAndComplete(groupKey)
+  }
+
+  private def schedulePendingSync(
+    group: GroupMetadata
+  ): Unit = {
+    val delayedSync = new DelayedSync(this, group, group.rebalanceTimeoutMs)

Review comment:
       That's a good point. I thought about it as well. Given that the goal of 
this PR is to protect us against misbehaving or buggy clients, I think that it 
is OK to allow for a full max.poll.interval.ms in between Join and Sync. 
Practically, I have found this approach a bit harder to test as we end up with 
both a DelayedJoin and a DelayedSync in parallel and they both rely on the same 
rebalance timeout. It is a little easier to reason about them when the are 
disjoint. That being said, I don't feel strong either ways.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [kafka] dajac commented on a change in pull request #10863: KAFKA-12890; Consumer group stuck in `CompletingRebalance`

Reply via email to