[ https://issues.apache.org/jira/browse/HIVE-22420?focusedWorklogId=753288&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753288 ]
ASF GitHub Bot logged work on HIVE-22420: ----------------------------------------- Author: ASF GitHub Bot Created on: 06/Apr/22 10:04 Start Date: 06/Apr/22 10:04 Worklog Time Spent: 10m Work Description: pvary commented on code in PR #3181: URL: https://github.com/apache/hive/pull/3181#discussion_r843750056 ########## ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java: ########## @@ -574,30 +571,24 @@ public void rollbackTxn() throws LockException { if (!isTxnOpen()) { throw new RuntimeException("Attempt to rollback before opening a transaction"); } - stopHeartbeat(); - try { - lockMgr.clearLocalLockRecords(); + clearLocksAndHB(); LOG.debug("Rolling back " + JavaUtils.txnIdToString(txnId)); - - // Re-checking as txn could have been closed, in the meantime, by a competing thread. - if (isTxnOpen()) { - if (replPolicy != null) { - getMS().replRollbackTxn(txnId, replPolicy, TxnType.DEFAULT); - } else { - getMS().rollbackTxn(txnId); - } + + if (replPolicy != null) { Review Comment: If we expect that this class should not be shared between threads, then we should write a comment on the class level for it Issue Time Tracking ------------------- Worklog Id: (was: 753288) Time Spent: 0.5h (was: 20m) > DbTxnManager.stopHeartbeat() should be thread-safe > -------------------------------------------------- > > Key: HIVE-22420 > URL: https://issues.apache.org/jira/browse/HIVE-22420 > Project: Hive > Issue Type: Bug > Affects Versions: 3.1.0 > Reporter: Aron Hamvas > Assignee: Aron Hamvas > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 4.0.0-alpha-1 > > Attachments: HIVE-22420.1.patch, HIVE-22420.2.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > When a transactional query is being executed and interrupted via HS2 close > operation request, both the background pool thread executing the query and > the HttpHandler thread running the close operation logic will eventually call > the below method: > {noformat} > Driver.releaseLocksAndCommitOrRollback(commit boolean) > {noformat} > Since this method is invoked several times in both threads, it can happen > that the two threads invoke it at the same time, and due to a race condition, > the txnId field of the DbTxnManager used by both threads could be set to 0 > without actually successfully aborting the transaction. > The root cause is stopHeartbeat() method in DbTxnManager not being thread > safe: > When Thread-1 and Thread-2 enter stopHeartbeat() with very little time > difference, Thread-1 might successfully cancel the heartbeat task and set the > heartbeatTask field to null, while Thread-2 is trying to observe its state. > Thread-1 will return to the calling rollbackTxn() method and continue > execution there, while Thread-2 wis thrown back to the same method with a > NullPointerException. Thread-2 will then set txnId to 0, and Thread-1 is > sending this 0 value to HMS. So, the txn will not be aborted, and the locks > cannot be released later on either. -- This message was sent by Atlassian Jira (v8.20.1#820001)