Re: [PR] KAFKA-18170: Add scheduled job to snapshot cold share partitions. [kafka]

via GitHub Fri, 18 Apr 2025 09:52:35 -0700


smjn commented on PR #19443:
URL: https://github.com/apache/kafka/pull/19443#issuecomment-2815838761


   > When I delete the `__share_group_state` topic I get the following 
exception information:
   > 
   > ```
   > [2025-04-17 09:41:17,524] INFO [ShareCoordinator id=1] Pruning records in 
__share_group_state-0 till offset 3. 
(org.apache.kafka.coordinator.share.ShareCoordinatorService)
   > [2025-04-17 09:41:17,527] ERROR [ShareCoordinator id=1] Received error in 
share-group state topic prune. 
(org.apache.kafka.coordinator.share.ShareCoordinatorService)
   > java.util.concurrent.CompletionException: 
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition.
   >    at 
java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332)
 [?:?]
   >    at 
java.base/java.util.concurrent.CompletableFuture.andTree(CompletableFuture.java:1527)
 [?:?]
   >    at 
java.base/java.util.concurrent.CompletableFuture.allOf(CompletableFuture.java:2419)
 [?:?]
   >    at 
org.apache.kafka.coordinator.share.ShareCoordinatorService$1.run(ShareCoordinatorService.java:281)
 [kafka-share-coordinator-4.1.0-SNAPSHOT.jar:?]
   >    at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
 [?:?]
   >    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) 
[?:?]
   >    at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
 [?:?]
   >    at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
 [?:?]
   >    at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
   > Caused by: 
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition.
   > ```
   > 
   > The background tasks perpetually throw exceptions in this situation, and I 
suspect that a more orderly leadership change could similarly make the code 
sad. While the coordinator runtime is properly able to handle unfortunate 
leadership events, I think the error handling of the background tasks in the 
share coordinator needs a little refinement.
   
   
   @AndrewJSchofield I do not understand this use case. The leadership changes 
are handled properly by the job as the topic partitions used by the job 
(activeTopicPartitions()) is maintained by the runtime.
   
   If an internal TP is moved from broker 1 -> 2 then the corresponding active 
topic partition lists will be different for broker 1 and broker 2 (maintained 
by runtime). If you are talking about logging the exception - we can handle 
that in this case.
   
   We have a similar test in 
`ShareConsumerTest.testShareConsumerAfterCoordinatorMovement` as well.
   
   The exception is due to writing trying to delete the record offsets from the 
replicaManager, there is no exception being thrown from the runtime here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] KAFKA-18170: Add scheduled job to snapshot cold share partitions. [kafka]

Reply via email to