[ 
https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15755647#comment-15755647
 ] 

Wei Zheng commented on HIVE-15376:
----------------------------------

acquireLocksWithHeartbeatDelay was previously used only by JUnit test 
TestDbTxnManager2, which now is using openTxn(Context, String, long) as we've 
moved the heartbeater starting logic from acquireLocksWithHeartbeatDelay to 
openTxn.

I moved the RO query heartbeater cancelling logic into if (!atLeastOneLock). 
Thanks for pointing that out.

The idea for the change in TestDbTxnManager.testLockTimeout() was to get rid of 
the influence of heartbeat that is introduced in acquireLocks. But now I 
realized I shouldn't have called openTxn and used delay to accomplish that. I 
made a change by introducing an additional param for not starting heartbeat.

Case 4 in TestDbTxnManager.testHeartbeater() is proving that even when there's 
no open transaction, as long as there's lock required, we will send heartbeat.

> Improve heartbeater scheduling for transactions
> -----------------------------------------------
>
>                 Key: HIVE-15376
>                 URL: https://issues.apache.org/jira/browse/HIVE-15376
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>    Affects Versions: 2.2.0
>            Reporter: Wei Zheng
>            Assignee: Wei Zheng
>         Attachments: HIVE-15376.1.patch, HIVE-15376.2.patch, 
> HIVE-15376.3.patch, HIVE-15376.4.patch, HIVE-15376.5.patch, 
> HIVE-15376.6.patch, HIVE-15376.7.patch, HIVE-15376.8.patch
>
>
> HIVE-12366 improved the heartbeater logic by bringing down the gap between 
> the lock acquisition and first heartbeat, but that's not enough, there may 
> still be some issue, e.g.
>  Time A: a transaction is opened
>  Time B: acquireLocks is called (blocking call), but it can take a long time 
> to actually acquire the locks and return if the system is busy
>  Time C: as acquireLocks returns, the first heartbeat is sent
> If hive.txn.timeout < C - A, then the transaction will be timed out and 
> aborted, thus causing failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to