[ 
https://issues.apache.org/jira/browse/IMPALA-14433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18020533#comment-18020533
 ] 

ASF subversion and git services commented on IMPALA-14433:
----------------------------------------------------------

Commit ec809fc16c32afd47ad795d4b8a26880413bf660 in impala's branch 
refs/heads/master from jasonmfehr
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=ec809fc16 ]

IMPALA-14433: Fix OpenTelemetry Tracing Deadlock

All functions in the SpanManager class operate under the assumption
that child_span_mu_ in the SpanManager class will be locked before
the ClientRequestState lock. However, the
ImpalaServer::ExecuteInternal function takes the ClientRequestState
lock before calling SpanManager::EndChildSpanPlanning. If another
function in the SpanManager class has already taken the
child_span_mu_ lock and is waiting for the ClientRequestState lock,
a deadlock occurs.

This issue was found by running end-to-end tests with OpenTelemetry
tracing enabled and a release buildof Impala.

Testing accomplished by re-running the end-to-end tests with
OpenTelemetry tracing enabled and verifying that the deadlock no
longer occurs.

Change-Id: I7b43dba794cfe61d283bdd476e4056b9304d8947
Reviewed-on: http://gerrit.cloudera.org:8080/23422
Reviewed-by: Joe McDonnell <[email protected]>
Reviewed-by: Michael Smith <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Deadlock in OpenTelemetry Tracing Code
> --------------------------------------
>
>                 Key: IMPALA-14433
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14433
>             Project: IMPALA
>          Issue Type: Bug
>    Affects Versions: Impala 5.0.0
>            Reporter: Jason Fehr
>            Assignee: Jason Fehr
>            Priority: Critical
>
> All functions in the SpanManager class operate under the assumption that 
> child_span_mu_ in the SpanManager class will be locked before the 
> ClientRequestState lock. However, the ImpalaServer::ExecuteInternal function 
> takes the ClientRequestState lock before calling 
> SpanManager::EndChildSpanPlanning.
> Simplified Explanation:
> 1. Thread 1 -- ImpalaServer::ExecuteInternal takes ClientRequestState lock
> 2. Thread 2 -- a SpanManager function (such as StartChildSpanClose) locks 
> child_span_mu_
> 3. Thread 2 -- attempts to take ClientRequestState lock, waits because Thread 
> 1 owns that lock
> 4. Thread 1 --  ImpalaServer::ExecuteInternal calls 
> SpanManager::EndChildSpanPlanning
> 5. Thread 1 -- attempts to take child_span_mu_ lock but waits because Thread 
> 2 owns that lock
> Detailed Explanation:
> The deadlock happens when another function (such as StartChildSpanClose) is 
> called after ImpalaServer::ExecuteInternal has taken a lock on the 
> ClientRequestState lock but before that same function calls 
> SpanManager::EndChildSpanPlanning.  In this case, the other function takes a 
> lock on child_span_mu_ followed by trying to take the ClientRequestState 
> lock.  Since ImpalaServer::ExecuteInternal already holds that lock, the other 
> function waits.  Then, when ImpalaServer::ExecuteInternal calls 
> SpanManager::EndChildSpanPlanning, it tries to lock child_span_mu_ which is 
> already held.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to