Jason Fehr created IMPALA-14433:
-----------------------------------
Summary: Deadlock in OpenTelemetry Tracing Code
Key: IMPALA-14433
URL: https://issues.apache.org/jira/browse/IMPALA-14433
Project: IMPALA
Issue Type: Bug
Affects Versions: Impala 5.0.0
Reporter: Jason Fehr
Assignee: Jason Fehr
All functions in the SpanManager class operate under the assumption that
child_span_mu_ in the SpanManager class will be locked before the
ClientRequestState lock. However, the ImpalaServer::ExecuteInternal function
takes the ClientRequestState lock before calling
SpanManager::EndChildSpanPlanning.
Simplified Explanation:
1. Thread 1 -- ImpalaServer::ExecuteInternal takes ClientRequestState lock
2. Thread 2 -- a SpanManager function (such as StartChildSpanClose) locks
child_span_mu_
3. Thread 2 -- attempts to take ClientRequestState lock, waits because Thread 1
owns that lock
4. Thread 1 -- ImpalaServer::ExecuteInternal calls
SpanManager::EndChildSpanPlanning
5. Thread 1 -- attempts to take child_span_mu_ lock but waits because Thread 2
owns that lock
Detailed Explanation:
The deadlock happens when another function (such as StartChildSpanClose) is
called after ImpalaServer::ExecuteInternal has taken a lock on the
ClientRequestState lock but before that same function calls
SpanManager::EndChildSpanPlanning. In this case, the other function takes a
lock on child_span_mu_ followed by trying to take the ClientRequestState lock.
Since ImpalaServer::ExecuteInternal already holds that lock, the other function
waits. Then, when ImpalaServer::ExecuteInternal calls
SpanManager::EndChildSpanPlanning, it tries to lock child_span_mu_ which is
already held.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]