hachikuji opened a new pull request, #13107: URL: https://github.com/apache/kafka/pull/13107
When a reassignment is cancelled, we need to delete the partition state of adding replicas. Failing to do so causes "stray" partitions which take up disk space and can cause topicId conflicts if the topic is recreated. Currently, this logic does not work because the leader epoch does not always get bumped after cancellation. Without a leader epoch bump, the replica will ignore `StopReplica` requests sent by the controller and the replica may remain online. In this patch, we fix the issue by sending the sentinel -2 in the `StopReplica` request. Currently, this sentinel is used when a topic is being deleted, which is another case where we cannot depend on a leader epoch bump. This expands the usage to cover all replica deletions including the case of cancellation. Note, this problem only affects the ZK controller. The integration tests added here nevertheless cover both metadata modes. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org