Eugene Koifman created HIVE-14427:
-------------------------------------

             Summary: CompactionTxnHandler.markCleaned() can delete aborted txns
                 Key: HIVE-14427
                 URL: https://issues.apache.org/jira/browse/HIVE-14427
             Project: Hive
          Issue Type: Improvement
          Components: Transactions
            Reporter: Eugene Koifman


We can modify 
{noformat}
s = "select distinct txn_id from TXNS, TXN_COMPONENTS where txn_id = tc_txnid 
and txn_state = '" +
          TXN_ABORTED + "' and tc_database = '" + info.dbname + "' and tc_table 
= '" +
          info.tableName + "'" + (info.highestTxnId == 0 ? "" : " and txn_id <= 
" + info.highestTxnId);
{noformat}
to use select txn_id, count(*) ... group by txn_id so that we know the number 
of components in a TXN.

Then when running "delete from TXN_COMPONENTS where..." we know how many rows 
were deleted.
If the sum of all values from 1st query matched total number of rows deleted, 
we know that all Aborted txns in this set are empty and thus can be deleted 
here.

This means we clean up aborted txns from TXNS table quicker and avoid a large 
join in _cleanEmptyAbortedTxns()_.  Also, doing delete on TXNS here will have 
PKs in WHERE clause so it should be cheap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to