[ https://issues.apache.org/jira/browse/HIVE-22420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963668#comment-16963668 ]
Hive QA commented on HIVE-22420: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12984364/HIVE-22420.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 17542 tests executed *Failed tests:* {noformat} TestStatsReplicationScenariosACIDNoAutogather - did not produce a TEST-*.xml file (likely timed out) (batchId=255) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/19221/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/19221/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-19221/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12984364 - PreCommit-HIVE-Build > DbTxnManager.stopHeartbeat() should be thread-safe > -------------------------------------------------- > > Key: HIVE-22420 > URL: https://issues.apache.org/jira/browse/HIVE-22420 > Project: Hive > Issue Type: Bug > Affects Versions: 3.1.0 > Reporter: Aron Hamvas > Assignee: Aron Hamvas > Priority: Major > Attachments: HIVE-22420.1.patch > > > When a transactional query is being executed and interrupted via HS2 close > operation request, both the background pool thread executing the query and > the HttpHandler thread running the close operation logic will eventually call > the below method: > {noformat} > Driver.releaseLocksAndCommitOrRollback(commit boolean) > {noformat} > Since this method is invoked several times in both threads, it can happen > that the two threads invoke it at the same time, and due to a race condition, > the txnId field of the DbTxnManager used by both threads could be set to 0 > without actually successfully aborting the transaction. > The root cause is stopHeartbeat() method in DbTxnManager not being thread > safe: > When Thread-1 and Thread-2 enter stopHeartbeat() with very little time > difference, Thread-1 might successfully cancel the heartbeat task and set the > heartbeatTask field to null, while Thread-2 is trying to observe its state. > Thread-1 will return to the calling rollbackTxn() method and continue > execution there, while Thread-2 wis thrown back to the same method with a > NullPointerException. Thread-2 will then set txnId to 0, and Thread-1 is > sending this 0 value to HMS. So, the txn will not be aborted, and the locks > cannot be released later on either. -- This message was sent by Atlassian Jira (v8.3.4#803005)