smjn commented on PR #18718:
URL: https://github.com/apache/kafka/pull/18718#issuecomment-2620807766

   > On my laptop, `testShareConsumerAfterCoordinatorMovement` failed after 
only a handful of iterations. Please investigate @smjn.
   
   An exception was being thrown from the 
`PersisterStateManager.SendThread.generateRequests` method. Looking deeper we 
found that `PersisterStateManager.PersisterStateManagerHandler.lookupNeeded` 
was receiving exception from the 
`ShareCoordinatorMetadataCacheHelperImpl.getShareCoordinator` method but only 
in a few cases (when inflight write RPCs end up on the shutdown coordinator`. 
Since the resulting exception was unexpected, it was not being handled in the 
aforementioned method and was breaking the code.
   Surrounding the method code with a try catch and explicitly catching the 
specific exception and returning a no node instead - fixed the issue.
   Additionally, due to uncertainty around consumer finishing after shutdown 
the timeout increase was also required in the `waitForCondition` checking if 
producer and consumer have both completed.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to