cryptoe commented on code in PR #18254:
URL: https://github.com/apache/druid/pull/18254#discussion_r2214906895
##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java:
##########
@@ -1379,13 +1377,20 @@ private void contactWorkersForStage(
final boolean retryOnFailure
)
{
- // Sorted copy of target worker numbers to ensure consistent iteration
order.
+ // Sorted copy of target worker numbers to ensure a consistent iteration
order.
final List<Integer> workersCopy = Ordering.natural().sortedCopy(workers);
final List<String> workerIds = getWorkerIds();
final List<ListenableFuture<Void>> workerFutures = new
ArrayList<>(workersCopy.size());
try {
- workerManager.waitForWorkers(workers);
+ workerManager.waitForWorkers(
+ workers,
+ (workerTask, fault) -> {
+ throwIfNonRetriableFault(fault);
+ // no need to add it to the kernel manipulation queue since this
is the main controller thread calling this function.
+ addToRetryQueue(queryKernel, workerTask.getWorkerNumber(), fault);
Review Comment:
There are 2 blocking calls in the `workerManager` which is called by the
main controller thread :
```
> workerManager.launchWorkersIfNeeded(
> workerManager.waitForWorkers(
```
In case the worker spawned fails for any reason , the worker manager thread
calls the failure listener :
https://github.com/apache/druid/blob/aa29d43eac515905836a9641eaa83f074e2cc8c0/extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/indexing/MSQWorkerTaskLauncher.java#L615
which adds the retry event in the controller kernel queue.
Since we need access to work orders, we can only access that information via
the controller thread due to the single threaded nature of the query kernel.
Since the main controller thread is blocked on `launchingTheWorker`,`
waitingForWorker` to launch, we never really launch the worker since the retry
event is never processed.
Let me try to find a way to pass the failure listener as part of the
constructor
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]