FrankYang0529 opened a new pull request, #18665: URL: https://github.com/apache/kafka/pull/18665
The `AbstractCoordinator#joinGroupIfNeeded` calls `ConsumerNetworkClient#poll(RequestFuture, Timer)` [0]. It calls another method with `disableWakeup = false`. If `joinFuture` is not done, ConsumerNetworkClient does poll multiple times before timer is expired [1]. If `joinFuture` finishes too fast, the SyncGroupRequest can't be handled in `AbstractCoordinator#ensureActiveGroup`, so it doesn't throw WakeupException. Change `ConsumerNetworkClient#poll(RequestFuture, Timer, boolean)` like following in the trunk branch and the flaky cases can be reproduced frequently. ``` public boolean poll(RequestFuture<?> future, Timer timer, boolean disableWakeup) { do { poll(timer, future, disableWakeup); try { System.err.println("sleeping....."); Thread.sleep(1000L); } catch (InterruptedException e) { e.printStackTrace(); } } while (!future.isDone() && timer.notExpired()); return future.isDone(); } ``` [0] https://github.com/apache/kafka/blob/3d49159c841e7653e3951af4ffc3524d17339295/clients/src/main/java/org/apache/kafka/clients/consumer/internals/AbstractCoordinator.java#L480-L481 [1] https://github.com/apache/kafka/blob/3d49159c841e7653e3951af4ffc3524d17339295/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerNetworkClient.java#L230-L235 ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org