[ https://issues.apache.org/jira/browse/HIVE-27332?focusedWorklogId=861590&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-861590 ]
ASF GitHub Bot logged work on HIVE-27332: ----------------------------------------- Author: ASF GitHub Bot Created on: 11/May/23 17:01 Start Date: 11/May/23 17:01 Worklog Time Spent: 10m Work Description: sonarcloud[bot] commented on PR #4313: URL: https://github.com/apache/hive/pull/4313#issuecomment-1544358463 Kudos, SonarCloud Quality Gate passed! [](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=4313) [](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4313&resolved=false&types=BUG) [](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4313&resolved=false&types=BUG) [4 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4313&resolved=false&types=BUG) [](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4313&resolved=false&types=VULNERABILITY) [](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4313&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4313&resolved=false&types=VULNERABILITY) [](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4313&resolved=false&types=SECURITY_HOTSPOT) [](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4313&resolved=false&types=SECURITY_HOTSPOT) [1 Security Hotspot](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4313&resolved=false&types=SECURITY_HOTSPOT) [](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4313&resolved=false&types=CODE_SMELL) [](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4313&resolved=false&types=CODE_SMELL) [15 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4313&resolved=false&types=CODE_SMELL) [](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4313&metric=coverage&view=list) No Coverage information [](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4313&metric=duplicated_lines_density&view=list) No Duplication information Issue Time Tracking ------------------- Worklog Id: (was: 861590) Time Spent: 20m (was: 10m) > Add retry backoff mechanism for abort cleanup > --------------------------------------------- > > Key: HIVE-27332 > URL: https://issues.apache.org/jira/browse/HIVE-27332 > Project: Hive > Issue Type: Sub-task > Reporter: Sourabh Badhya > Assignee: Sourabh Badhya > Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > HIVE-27019 and HIVE-27020 added the functionality to directly clean data > directories from aborted transactions without using Initiator & Worker. > However, during the event of continuous failure during cleanup, the retry > mechanism is initiated every single time. We need to add retry backoff > mechanism to control the time required to initiate retry again and not > continuously retry. > There are widely 3 cases wherein retry due to abort cleanup is impacted - > *1. Abort cleanup on the table failed + Compaction on the table failed.* > *2. Abort cleanup on the table failed + Compaction on the table passed* > *3. Abort cleanup on the table failed + No compaction on the table.* > *Solution -* > *We create a new table called TXN_CLEANUP_QUEUE with following fields to > store the retry metadata -* > CREATE TABLE TXN_CLEANUP_QUEUE ( > TCQ_DATABASE varchar(128) NOT NULL, > TCQ_TABLE varchar(256) NOT NULL, > TCQ_PARTITION varchar(767), > TCQ_RETRY_RETENTION bigint NOT NULL DEFAULT 0, > TCQ_ERROR_MESSAGE mediumtext in MySQL / clob in derby, oracle DB / text in > postgres / varchar(max) in mssql DB > ); > *Advantage: Separates the flow of metadata. We also eliminate the chance of > breaking the compaction/abort cleanup when modifying metadata of abort > cleanup/compaction. Easier debugging in case of failures.* > *Actions performed by TaskHandler in the case of failure -* > *AbortTxnCleaner -* > Action: Just add retry details in the queue table during the abort failure. > *CompactionCleaner -* > Action: If compaction on the same table is successful, delete the retry entry > in markCleaned when removing any TXN_COMPONENTS entries except when there are > no uncompacted aborts. We do not want to be in a situation where there is a > queue entry for a table but there is no record in TXN_COMPONENTS associated > with the same table. > *Advantage: Expecting no performance issues with this approach. Since we > delete 1 record most of the times for the associated table/partition.* -- This message was sent by Atlassian Jira (v8.20.10#820010)