smjn commented on PR #18718: URL: https://github.com/apache/kafka/pull/18718#issuecomment-2620807766
> On my laptop, `testShareConsumerAfterCoordinatorMovement` failed after only a handful of iterations. Please investigate @smjn. An exception was being thrown from the `PersisterStateManager.SendThread.generateRequests` method. Looking deeper we found that `PersisterStateManager.PersisterStateManagerHandler.lookupNeeded` was receiving exception from the `ShareCoordinatorMetadataCacheHelperImpl.getShareCoordinator` method but only in a few cases (when inflight write RPCs end up on the shutdown coordinator`. Since the resulting exception was unexpected, it was not being handled in the aforementioned method and was breaking the code. Surrounding the method code with a try catch and explicitly catching the specific exception and returning a no node instead - fixed the issue. Additionally, due to uncertainty around consumer finishing after shutdown the timeout increase was also required in the `waitForCondition` checking if producer and consumer have both completed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org