[GitHub] [kafka] hachikuji commented on a change in pull request #9792: KAFKA-10870: handle REBALANCE_IN_PROGRESS error in JoinGroup

GitBox Wed, 30 Dec 2020 09:40:52 -0800


hachikuji commented on a change in pull request #9792:
URL: https://github.com/apache/kafka/pull/9792#discussion_r550271098




##########
File path: 
clients/src/main/java/org/apache/kafka/clients/consumer/internals/AbstractCoordinator.java
##########
@@ -658,6 +658,10 @@ public void handle(JoinGroupResponse joinResponse, 
RequestFuture<ByteBuffer> fut
                     AbstractCoordinator.this.generation = new 
Generation(OffsetCommitRequest.DEFAULT_GENERATION_ID, memberId, null);
                 }
                 future.raise(error);
+            } else if (error == Errors.REBALANCE_IN_PROGRESS) {
+                log.info("JoinGroup failed due to the group began another 
rebalance. Need to re-join the group.");
+                requestRejoin();

Review comment:
       I think the call to `requestRejoin` is not needed. We only reset the 
`rejoinNeeded` flag after the subsequent SyncGroup request. Or is there a case 
that this misses?

##########
File path: 
clients/src/main/java/org/apache/kafka/clients/consumer/internals/AbstractCoordinator.java
##########
@@ -658,6 +658,10 @@ public void handle(JoinGroupResponse joinResponse, 
RequestFuture<ByteBuffer> fut
                     AbstractCoordinator.this.generation = new 
Generation(OffsetCommitRequest.DEFAULT_GENERATION_ID, memberId, null);
                 }
                 future.raise(error);
+            } else if (error == Errors.REBALANCE_IN_PROGRESS) {
+                log.info("JoinGroup failed due to the group began another 
rebalance. Need to re-join the group.");

Review comment:
       I guess it's kind of a confusing error to see. The case on the broker is 
when the write to the log failed because of a timeout. I wonder if it would be 
useful to suggest the cause in the message. For example:
   > JoinGroup failed with a REBALANCE_IN_PROGRESS error, which could indicate 
a replication timeout on the broker. Will retry.
   

##########
File path: 
clients/src/test/java/org/apache/kafka/clients/consumer/internals/AbstractCoordinatorTest.java
##########
@@ -823,6 +823,24 @@ public void 
testJoinGroupRequestWithGroupInstanceIdNotFound() {
         assertTrue(coordinator.hasUnknownGeneration());
     }
 
+    @Test
+    public void testJoinGroupRequestWithRebalanceInProgress() {
+        setupCoordinator();
+        mockClient.prepareResponse(groupCoordinatorResponse(node, 
Errors.NONE));
+        coordinator.ensureCoordinatorReady(mockTime.timer(0));
+
+        mockClient.prepareResponse(
+            joinGroupFollowerResponse(defaultGeneration, memberId, 
JoinGroupRequest.UNKNOWN_MEMBER_ID, Errors.REBALANCE_IN_PROGRESS));
+
+        RequestFuture<ByteBuffer> future = coordinator.sendJoinGroupRequest();
+
+        assertTrue(consumerClient.poll(future, 
mockTime.timer(REQUEST_TIMEOUT_MS)));
+        
assertTrue(future.exception().getClass().isInstance(Errors.REBALANCE_IN_PROGRESS.exception()));
+        assertEquals(Errors.REBALANCE_IN_PROGRESS.message(), 
future.exception().getMessage());
+        // should request rejoin
+        assertTrue(coordinator.rejoinNeededOrPending());

Review comment:
       Can we verify that the JoinGroup gets retried on the next poll?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [kafka] hachikuji commented on a change in pull request #9792: KAFKA-10870: handle REBALANCE_IN_PROGRESS error in JoinGroup

Reply via email to