[ https://issues.apache.org/jira/browse/HIVE-12904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15112789#comment-15112789 ]
Sergey Shelukhin commented on HIVE-12904: ----------------------------------------- Synchronizing on "this" is a bad pattern, if someone decides to synchronize on this object externally it will cause perf issues. Otherwise +1 > LLAP: deadlock in task scheduling > --------------------------------- > > Key: HIVE-12904 > URL: https://issues.apache.org/jira/browse/HIVE-12904 > Project: Hive > Issue Type: Bug > Affects Versions: 2.0.0 > Reporter: Hui Zheng > Assignee: Sergey Shelukhin > Priority: Critical > Attachments: HIVE-12904.2.patch, HIVE-12904.patch > > > {noformat} > Thread 34107: (state = BLOCKED) > - > org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService$TaskWrapper.isInWaitQueue() > @bci=0, line=690 (Compiled frame) > - > org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService.finishableStateUpdated(org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService$TaskWrapper, > boolean) @bci=8, line=485 (Compiled frame) > - > org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService.access$1500(org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService, > org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService$TaskWrapper, > boolean) @bci=3, line=78 (Compiled frame) > - > org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService$TaskWrapper.finishableStateUpdated(boolean) > @bci=27, line=733 (Compiled frame) > - > org.apache.hadoop.hive.llap.daemon.impl.QueryInfo$FinishableStateTracker.sourceStateUpdated(java.lang.String) > @bci=76, line=210 (Compiled frame) > - > org.apache.hadoop.hive.llap.daemon.impl.QueryInfo.sourceStateUpdated(java.lang.String) > @bci=5, line=164 (Compiled frame) > - > org.apache.hadoop.hive.llap.daemon.impl.QueryTracker.registerSourceStateChange(java.lang.String, > java.lang.String, > org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$SourceStateProto) > @bci=34, line=228 (Compiled frame) > - > org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl.sourceStateUpdated(org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$SourceStateUpdatedRequestProto) > @bci=47, line=255 (Compiled frame) > - > org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.sourceStateUpdated(org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$SourceStateUpdatedRequestProto) > @bci=5, line=328 (Compiled frame) > - > org.apache.hadoop.hive.llap.daemon.impl.LlapDaemonProtocolServerImpl.sourceStateUpdated(com.google.protobuf.RpcController, > > org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$SourceStateUpdatedRequestProto) > @bci=5, line=105 (Compiled frame) > - > org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$LlapDaemonProtocol$2.callBlockingMethod(com.google.protobuf.Descriptors$MethodDescriptor, > com.google.protobuf.RpcController, com.google.protobuf.Message) @bci=80, > line=13067 (Compiled frame) > - > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(org.apache.hadoop.ipc.RPC$Server, > java.lang.String, org.apache.hadoop.io.Writable, long) @bci=246, line=616 > (Compiled frame) > - org.apache.hadoop.ipc.RPC$Server.call(org.apache.hadoop.ipc.RPC$RpcKind, > java.lang.String, org.apache.hadoop.io.Writable, long) @bci=9, line=969 > (Compiled frame) > - org.apache.hadoop.ipc.Server$Handler$1.run() @bci=38, line=2151 (Compiled > frame) > - org.apache.hadoop.ipc.Server$Handler$1.run() @bci=1, line=2147 (Compiled > frame) > - > java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, > java.security.AccessControlContext) @bci=0 (Compiled frame) > - javax.security.auth.Subject.doAs(javax.security.auth.Subject, > java.security.PrivilegedExceptionAction) @bci=42, line=422 (Compiled frame) > - > org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) > @bci=14, line=1657 (Compiled frame) > - org.apache.hadoop.ipc.Server$Handler.run() @bci=315, line=2145 > (Interpreted frame) > and > Thread 34500: (state = BLOCKED) > - > org.apache.hadoop.hive.llap.daemon.impl.QueryInfo$FinishableStateTracker.unregisterForUpdates(org.apache.hadoop.hive.llap.daemon.FinishableStateUpdateHandler) > @bci=0, line=195 (Compiled frame) > - > org.apache.hadoop.hive.llap.daemon.impl.QueryInfo.unregisterFinishableStateUpdate(org.apache.hadoop.hive.llap.daemon.FinishableStateUpdateHandler) > @bci=5, line=160 (Compiled frame) > - > org.apache.hadoop.hive.llap.daemon.impl.QueryFragmentInfo.unregisterForFinishableStateUpdates(org.apache.hadoop.hive.llap.daemon.FinishableStateUpdateHandler) > @bci=5, line=143 (Compiled frame) > - > org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService$TaskWrapper.maybeUnregisterForFinishedStateNotifications() > @bci=20, line=681 (Compiled frame) > - > org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService$InternalCompletionListener.onSuccess(org.apache.tez.runtime.task.TaskRunner2Result) > @bci=32, line=548 (Compiled frame) > - > org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService$InternalCompletionListener.onSuccess(java.lang.Object) > @bci=5, line=535 (Compiled frame) > - com.google.common.util.concurrent.Futures$4.run() @bci=55, line=1149 > (Compiled frame) > - > java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) > @bci=95, line=1142 (Compiled frame) > - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617 > (Interpreted frame) > - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame) > "IPC Server handler 0 on 15001": > waiting to lock Monitor@0x00007f5d322ecb08 (Object@0x00007f67032cd2c0, a > org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService$TaskWrapper), > which is held by "ExecutionCompletionThread #0" > "ExecutionCompletionThread #0": > waiting to lock Monitor@0x00007f6066b9e8c8 (Object@0x00007f66b6570200, a > org/apache/hadoop/hive/llap/daemon/impl/QueryInfo$FinishableStateTracker), > which is held by "IPC Server handler 0 on 15001" > Found a total of 1 deadlock. > {noformat} > Looks like it's caused by synchronized blocks: > {noformat} > TaskWrapper: > public synchronized void maybeUnregisterForFinishedStateNotifications > {noformat} > Eventually calls > {noformat} > FinishableStateTracker > synchronized void unregisterForUpdates(FinishableStateUpdateHandler handler) { > {noformat} > and > {noformat} > FST > synchronized void sourceStateUpdated(String sourceName) { > {noformat} > eventually calls > {noformat} > public synchronized boolean isInWaitQueue() { > {noformat} > The latter returns the boolean, so it definitely doesn't need synchronized, > however I don't know if there are other similar issues and what is necessary > inside sync blocks, perhaps there's a better fix. > Overall I'd say synch methods on objects that call any other non-trivial > objects should not be used. Perhaps for now it would be good to replace all > sync methods by sync blocks that cover entire method, as well as remove the > unnecessary ones like the isWait... one. Then the scope of the blocks can be > adjusted based on logic in future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)