[ https://issues.apache.org/jira/browse/KAFKA-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16988361#comment-16988361 ]
ASF GitHub Bot commented on KAFKA-9184: --------------------------------------- kkonstantine commented on pull request #7783: KAFKA-9184 (port on 2.3): Redundant task creation and periodic rebalances after zombie Connect worker rejoins the group URL: https://github.com/apache/kafka/pull/7783 Check connectivity with broker coordinator in intervals and stop tasks if coordinator is unreachable by setting `assignmentSnapshot` to null and resetting rebalance delay when there are no lost tasks. And, because we're now sometimes setting `assignmentSnapshot` to null and reading it from other methods and thread, made this member volatile and used local references to ensure consistent reads. Adapted existing unit tests to verify additional debug calls, added more specific log messages to `DistributedHerder`, and added a new integration test that verifies the behavior when the brokers are stopped and restarted only after the workers lose their heartbeats with the broker coordinator. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Redundant task creation and periodic rebalances after zombie worker rejoins > the group > ------------------------------------------------------------------------------------- > > Key: KAFKA-9184 > URL: https://issues.apache.org/jira/browse/KAFKA-9184 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect > Affects Versions: 2.4.0, 2.3.2 > Reporter: Konstantine Karantasis > Assignee: Konstantine Karantasis > Priority: Blocker > Fix For: 2.4.0, 2.3.2 > > > First reported here: > https://stackoverflow.com/questions/58631092/kafka-connect-assigns-same-task-to-multiple-workers > There seems to be an issue with task reassignment when a worker rejoins after > an unsuccessful join request. The worker seems to be outside the group for a > generation but when it joins again the same task is running in more than one > worker -- This message was sent by Atlassian Jira (v8.3.4#803005)