smjn commented on PR #19443: URL: https://github.com/apache/kafka/pull/19443#issuecomment-2815838761
> When I delete the `__share_group_state` topic I get the following exception information: > > ``` > [2025-04-17 09:41:17,524] INFO [ShareCoordinator id=1] Pruning records in __share_group_state-0 till offset 3. (org.apache.kafka.coordinator.share.ShareCoordinatorService) > [2025-04-17 09:41:17,527] ERROR [ShareCoordinator id=1] Received error in share-group state topic prune. (org.apache.kafka.coordinator.share.ShareCoordinatorService) > java.util.concurrent.CompletionException: org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. > at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332) [?:?] > at java.base/java.util.concurrent.CompletableFuture.andTree(CompletableFuture.java:1527) [?:?] > at java.base/java.util.concurrent.CompletableFuture.allOf(CompletableFuture.java:2419) [?:?] > at org.apache.kafka.coordinator.share.ShareCoordinatorService$1.run(ShareCoordinatorService.java:281) [kafka-share-coordinator-4.1.0-SNAPSHOT.jar:?] > at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?] > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] > at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] > at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] > at java.base/java.lang.Thread.run(Thread.java:840) [?:?] > Caused by: org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. > ``` > > The background tasks perpetually throw exceptions in this situation, and I suspect that a more orderly leadership change could similarly make the code sad. While the coordinator runtime is properly able to handle unfortunate leadership events, I think the error handling of the background tasks in the share coordinator needs a little refinement. @AndrewJSchofield I do not understand this use case. The leadership changes are handled properly by the job as the topic partitions used by the job (activeTopicPartitions()) is maintained by the runtime. If an internal TP is moved from broker 1 -> 2 then the corresponding active topic partition lists will be different for broker 1 and broker 2 (maintained by runtime). If you are talking about logging the exception - we can handle that in this case. We have a similar test in `ShareConsumerTest.testShareConsumerAfterCoordinatorMovement` as well. The exception is due to writing trying to delete the record offsets from the replicaManager, there is no exception being thrown from the runtime here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org