dajac opened a new pull request, #12910: URL: https://github.com/apache/kafka/pull/12910
We recently had a bug causing the JoinGroup callback to thrown an exception (https://github.com/apache/kafka/pull/12909). When it happens, the exception is propagated to the caller and the JoinGroup callback is never completed. To make it worst, the member whose callback failed become a zombie because the group coordinator does not expire member with a pending callback. This patch catch exceptions for both invocation of JoinGroup and SyncGroup callbacks and retry to complete them with a `UNKNOWN_SERVER_ERROR` error if they failed. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org