cryptoe commented on PR #18254:
URL: https://github.com/apache/druid/pull/18254#issuecomment-3092424727

   > The logic looks good to me (assuming I understand the scenario correctly: 
[#18254 
(comment)](https://github.com/apache/druid/pull/18254#discussion_r2216955916)). 
However it feels like it's getting overly complicated. IMO, it would be cleaner 
to restore the original listener behavior, and to deal with the 
`waitForWorkers` / `launchWorkersIfNeeded` situation a different way:
   > 
   >     * Return an `IntSet` of failed workers from both
   > 
   >     * In the controller, if the `IntSet` is nonempty, call 
`addToRetryQueue` and then re-call the original method.
   > 
   > 
   > Something like this in the controller:
   > 
   > ```java
   > IntSet failedWorkers;
   > while (!(failedWorkers = workerManager.waitForWorkers(workers)).isEmpty()) 
{
   >   for (int worker : failedWorkers) {
   >     addToRetryQueue(kernel, worker, fault);
   >   }
   > }
   > ```
   > 
   > That way we won't have to have this thing where there needs to be a 
same-thread callback in `waitForWorkers` / `launchWorkersIfNeeded`.
   > 
   > Approving since the current code is acceptable, but do consider the review 
comments. And if you agree with the alternate structure above, feel free to 
either implement it or at least add a comment to `getWorkerFailureListener` 
describing this future possible change.
   
   I can adjust the calling sites to add to retry queue as well like you 
mentioned. Will raise a follow up PR after this to unblock the release. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to