[
https://issues.apache.org/jira/browse/IMPALA-14433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18020533#comment-18020533
]
ASF subversion and git services commented on IMPALA-14433:
----------------------------------------------------------
Commit ec809fc16c32afd47ad795d4b8a26880413bf660 in impala's branch
refs/heads/master from jasonmfehr
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=ec809fc16 ]
IMPALA-14433: Fix OpenTelemetry Tracing Deadlock
All functions in the SpanManager class operate under the assumption
that child_span_mu_ in the SpanManager class will be locked before
the ClientRequestState lock. However, the
ImpalaServer::ExecuteInternal function takes the ClientRequestState
lock before calling SpanManager::EndChildSpanPlanning. If another
function in the SpanManager class has already taken the
child_span_mu_ lock and is waiting for the ClientRequestState lock,
a deadlock occurs.
This issue was found by running end-to-end tests with OpenTelemetry
tracing enabled and a release buildof Impala.
Testing accomplished by re-running the end-to-end tests with
OpenTelemetry tracing enabled and verifying that the deadlock no
longer occurs.
Change-Id: I7b43dba794cfe61d283bdd476e4056b9304d8947
Reviewed-on: http://gerrit.cloudera.org:8080/23422
Reviewed-by: Joe McDonnell <[email protected]>
Reviewed-by: Michael Smith <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Deadlock in OpenTelemetry Tracing Code
> --------------------------------------
>
> Key: IMPALA-14433
> URL: https://issues.apache.org/jira/browse/IMPALA-14433
> Project: IMPALA
> Issue Type: Bug
> Affects Versions: Impala 5.0.0
> Reporter: Jason Fehr
> Assignee: Jason Fehr
> Priority: Critical
>
> All functions in the SpanManager class operate under the assumption that
> child_span_mu_ in the SpanManager class will be locked before the
> ClientRequestState lock. However, the ImpalaServer::ExecuteInternal function
> takes the ClientRequestState lock before calling
> SpanManager::EndChildSpanPlanning.
> Simplified Explanation:
> 1. Thread 1 -- ImpalaServer::ExecuteInternal takes ClientRequestState lock
> 2. Thread 2 -- a SpanManager function (such as StartChildSpanClose) locks
> child_span_mu_
> 3. Thread 2 -- attempts to take ClientRequestState lock, waits because Thread
> 1 owns that lock
> 4. Thread 1 -- ImpalaServer::ExecuteInternal calls
> SpanManager::EndChildSpanPlanning
> 5. Thread 1 -- attempts to take child_span_mu_ lock but waits because Thread
> 2 owns that lock
> Detailed Explanation:
> The deadlock happens when another function (such as StartChildSpanClose) is
> called after ImpalaServer::ExecuteInternal has taken a lock on the
> ClientRequestState lock but before that same function calls
> SpanManager::EndChildSpanPlanning. In this case, the other function takes a
> lock on child_span_mu_ followed by trying to take the ClientRequestState
> lock. Since ImpalaServer::ExecuteInternal already holds that lock, the other
> function waits. Then, when ImpalaServer::ExecuteInternal calls
> SpanManager::EndChildSpanPlanning, it tries to lock child_span_mu_ which is
> already held.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]