hachikuji commented on a change in pull request #9792:
URL: https://github.com/apache/kafka/pull/9792#discussion_r550271098
##########
File path:
clients/src/main/java/org/apache/kafka/clients/consumer/internals/AbstractCoordinator.java
##########
@@ -658,6 +658,10 @@ public void handle(JoinGroupResponse joinResponse,
RequestFuture<ByteBuffer> fut
AbstractCoordinator.this.generation = new
Generation(OffsetCommitRequest.DEFAULT_GENERATION_ID, memberId, null);
}
future.raise(error);
+ } else if (error == Errors.REBALANCE_IN_PROGRESS) {
+ log.info("JoinGroup failed due to the group began another
rebalance. Need to re-join the group.");
+ requestRejoin();
Review comment:
I think the call to `requestRejoin` is not needed. We only reset the
`rejoinNeeded` flag after the subsequent SyncGroup request. Or is there a case
that this misses?
##########
File path:
clients/src/main/java/org/apache/kafka/clients/consumer/internals/AbstractCoordinator.java
##########
@@ -658,6 +658,10 @@ public void handle(JoinGroupResponse joinResponse,
RequestFuture<ByteBuffer> fut
AbstractCoordinator.this.generation = new
Generation(OffsetCommitRequest.DEFAULT_GENERATION_ID, memberId, null);
}
future.raise(error);
+ } else if (error == Errors.REBALANCE_IN_PROGRESS) {
+ log.info("JoinGroup failed due to the group began another
rebalance. Need to re-join the group.");
Review comment:
I guess it's kind of a confusing error to see. The case on the broker is
when the write to the log failed because of a timeout. I wonder if it would be
useful to suggest the cause in the message. For example:
> JoinGroup failed with a REBALANCE_IN_PROGRESS error, which could indicate
a replication timeout on the broker. Will retry.
##########
File path:
clients/src/test/java/org/apache/kafka/clients/consumer/internals/AbstractCoordinatorTest.java
##########
@@ -823,6 +823,24 @@ public void
testJoinGroupRequestWithGroupInstanceIdNotFound() {
assertTrue(coordinator.hasUnknownGeneration());
}
+ @Test
+ public void testJoinGroupRequestWithRebalanceInProgress() {
+ setupCoordinator();
+ mockClient.prepareResponse(groupCoordinatorResponse(node,
Errors.NONE));
+ coordinator.ensureCoordinatorReady(mockTime.timer(0));
+
+ mockClient.prepareResponse(
+ joinGroupFollowerResponse(defaultGeneration, memberId,
JoinGroupRequest.UNKNOWN_MEMBER_ID, Errors.REBALANCE_IN_PROGRESS));
+
+ RequestFuture<ByteBuffer> future = coordinator.sendJoinGroupRequest();
+
+ assertTrue(consumerClient.poll(future,
mockTime.timer(REQUEST_TIMEOUT_MS)));
+
assertTrue(future.exception().getClass().isInstance(Errors.REBALANCE_IN_PROGRESS.exception()));
+ assertEquals(Errors.REBALANCE_IN_PROGRESS.message(),
future.exception().getMessage());
+ // should request rejoin
+ assertTrue(coordinator.rejoinNeededOrPending());
Review comment:
Can we verify that the JoinGroup gets retried on the next poll?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]