[ https://issues.apache.org/jira/browse/HIVE-11317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700482#comment-14700482 ]
Lefty Leverenz commented on HIVE-11317: --------------------------------------- Just curious: Why do the TimeValidators for these new parameters have MILLISECONDS while their default values are in seconds? Other parameters match the defaults to the TimeValidator unit. {code} + HIVE_TIMEDOUT_TXN_REAPER_START("hive.timedout.txn.reaper.start", "100s", + new TimeValidator(TimeUnit.MILLISECONDS), "Time delay of 1st reaper run after metastore start"), + HIVE_TIMEDOUT_TXN_REAPER_INTERVAL("hive.timedout.txn.reaper.interval", "180s", + new TimeValidator(TimeUnit.MILLISECONDS), "Time interval describing how often the reaper runs"), {code} > ACID: Improve transaction Abort logic due to timeout > ---------------------------------------------------- > > Key: HIVE-11317 > URL: https://issues.apache.org/jira/browse/HIVE-11317 > Project: Hive > Issue Type: Bug > Components: Metastore, Transactions > Affects Versions: 1.0.0 > Reporter: Eugene Koifman > Assignee: Eugene Koifman > Labels: TODOC1.3, triage > Fix For: 1.3.0 > > Attachments: HIVE-11317.2.patch, HIVE-11317.3.patch, > HIVE-11317.4.patch, HIVE-11317.5.patch, HIVE-11317.6.patch, HIVE-11317.patch > > > the logic to Abort transactions that have stopped heartbeating is in > TxnHandler.timeOutTxns() > This is only called when DbTxnManger.getValidTxns() is called. > So if there is a lot of txns that need to be timed out and the there are not > SQL clients talking to the system, there is nothing to abort dead > transactions, and thus compaction can't clean them up so garbage accumulates > in the system. > Also, streaming api doesn't call DbTxnManager at all. > Need to move this logic into Initiator (or some other metastore side thread). > Also, make sure it is broken up into multiple small(er) transactions against > metastore DB. > Also more timeOutLocks() locks there as well. > see about adding TXNS.COMMENT field which can be used for "Auto aborted due > to timeout" for example. > The symptom of this is that the system keeps showing more and more Open > transactions that don't seem to ever go away (and have no locks associated > with them) -- This message was sent by Atlassian JIRA (v6.3.4#6332)