This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 16c350ce5aa5dbf1a9ea8b481c64780d445e9a2a
Author: Joe McDonnell <[email protected]>
AuthorDate: Wed Aug 6 23:27:42 2025 -0700

    IMPALA-14271: Reapply the core piece of IMPALA-6984
    
    IMPALA-6984 changed the behavior to cancel backends when the
    query reaches the RETURNED_RESULTS state. This ran into a regression
    on large clusters where a query would end up waiting 10 seconds.
    IMPALA-10047 reverted the core piece of the change.
    
    For tuple caching, we found that a scan node can get stuck waiting
    for a global runtime filter. It turns out that the coordinator will
    not send out global runtime filters if the query is in a terminal
    state. Tuple caching was causing queries to reach the RETURNED_RESULTS
    phase before the runtime filter could be sent out. Reenabling the core
    part of IMPALA-6984 sends out a cancel as soon as the query transitions
    to RETURNED_RESULTS and wakes up any fragment instances waiting on
    runtime filters.
    
    The underlying cause of IMPALA-10047 is a tangle of locks that causes
    us to exhaust the RPC threads. The coordinator is holding a lock on the
    backend state while it sends the cancel synchronously. Other backends
    that complete during that time run 
Coordinator::BackendState::LogFirstInProgress(),
    which iterates through backend states to find the first that is not done.
    The check to see if a backend state is done takes a lock on the backend
    state. The problem case is that the coordinator may be sending a cancel
    to a backend on itself. In that case, it needs an RPC thread on the 
coordinator
    to be available to process the cancel. If all of the RPC threads are
    processing updates, they can all call LogFirstInProgress() and get stuck
    on the backend state lock for the coordinator's fragment. In that case,
    it becomes a temporary deadlock as the cancel can't be processed and the
    coordinator won't release the lock. It only gets resolved by the RPC timing
    out.
    
    To resolve this, this changes the Cancel() method to drop the lock while
    doing the CancelQueryFInstances RPC. It reacquires the lock when it finishes
    the RPC.
    
    Testing:
     - Hand tested with 10 impalads and control_service_num_svc_threads=1
       Without the fix, it reproduces easily after reverting IMPALA-10047.
       With the fix, it doesn't reproduce.
    
    Change-Id: Ia058b03c72cc4bb83b0bd0a19ff6c8c43a647974
    Reviewed-on: http://gerrit.cloudera.org:8080/23264
    Reviewed-by: Yida Wu <[email protected]>
    Reviewed-by: Michael Smith <[email protected]>
    Tested-by: Impala Public Jenkins <[email protected]>
---
 be/src/runtime/coordinator-backend-state.cc | 19 +++++++++++++++++++
 be/src/runtime/coordinator.cc               |  4 +++-
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/be/src/runtime/coordinator-backend-state.cc 
b/be/src/runtime/coordinator-backend-state.cc
index 6be1cb52f..2ff12d90a 100644
--- a/be/src/runtime/coordinator-backend-state.cc
+++ b/be/src/runtime/coordinator-backend-state.cc
@@ -697,6 +697,13 @@ Coordinator::BackendState::CancelResult 
Coordinator::BackendState::Cancel(
       goto done;
     }
 
+    // Drop the lock while we send the cancel RPC. RPC threads from 
ReportExecStatus
+    // acquire the lock when calling IsDone() via LogFirstInProgress(). If 
this holds the
+    // lock for an extended period of time, it can block all of the RPC 
threads, leaving
+    // no thread to process this CancelQueryFInstances RPC locally. This 
causes a
+    // temporary deadlock until this RPC times out (10 seconds).
+    l.unlock();
+
     CancelQueryFInstancesRequestPB request;
     *request.mutable_query_id() = query_id_;
     CancelQueryFInstancesResponsePB response;
@@ -709,6 +716,18 @@ Coordinator::BackendState::CancelResult 
Coordinator::BackendState::Cancel(
             request, &response, query_ctx_, "Cancel() RPC failed", num_retries,
             timeout_ms, backoff_time_ms, "COORD_CANCEL_QUERY_FINSTANCES_RPC");
 
+    // Reacquire the lock after the RPC completes
+    l.lock();
+
+    // If this became done while we dropped the lock for the 
CancelQueryFInstances RPC,
+    // the results of the RPC should be ignored.
+    if (IsDoneLocked(l)) {
+      VLogForBackend(Substitute(
+          "Ignoring result of cancel RPC because the backend is already done: 
$0",
+          status_.GetDetail()));
+      goto done;
+    }
+
     if (!rpc_status.ok()) {
       status_.MergeStatus(rpc_status);
       result.became_done = true;
diff --git a/be/src/runtime/coordinator.cc b/be/src/runtime/coordinator.cc
index fe338f6f9..ac7755634 100644
--- a/be/src/runtime/coordinator.cc
+++ b/be/src/runtime/coordinator.cc
@@ -872,7 +872,9 @@ void Coordinator::HandleExecStateTransition(
   // execution and release resources.
   ReleaseExecResources();
   if (new_state == ExecState::RETURNED_RESULTS) {
-    // TODO: IMPALA-6984: cancel all backends in this case too.
+    // Cancel all backends, but wait for the final status reports to be 
received so that
+    // we have a complete profile for this successful query.
+    CancelBackends(/*fire_and_forget=*/ false);
     WaitForBackends();
   } else {
     CancelBackends(/*fire_and_forget=*/ true);

Reply via email to