[ https://issues.apache.org/jira/browse/HIVE-26718?focusedWorklogId=837479&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-837479 ]
ASF GitHub Bot logged work on HIVE-26718: ----------------------------------------- Author: ASF GitHub Bot Created on: 06/Jan/23 13:26 Start Date: 06/Jan/23 13:26 Worklog Time Spent: 10m Work Description: veghlaci05 commented on code in PR #3775: URL: https://github.com/apache/hive/pull/3775#discussion_r1063433513 ########## ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java: ########## @@ -504,6 +504,47 @@ private CompactionType determineCompactionType(CompactionInfo ci, AcidDirectory if (initiateMajor) return CompactionType.MAJOR; } + // bucket size calculation can be resource intensive if there are numerous deltas, so we check for rebalance + // compaction only if the table is in an acceptable shape: no major compaction required. This means the number of + // files shouldn't be too high + if ("tez".equalsIgnoreCase(HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_EXECUTION_ENGINE)) && Review Comment: Yes, running a rebalance compaction on uncompacted tables could be resource intensive due to the hive number of files and folders. So I decided to schedule rebalance compactions only on tables already major compacted. This ensures that the number of deltas are relatively low. ########## ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java: ########## @@ -504,6 +504,47 @@ private CompactionType determineCompactionType(CompactionInfo ci, AcidDirectory if (initiateMajor) return CompactionType.MAJOR; } + // bucket size calculation can be resource intensive if there are numerous deltas, so we check for rebalance + // compaction only if the table is in an acceptable shape: no major compaction required. This means the number of + // files shouldn't be too high + if ("tez".equalsIgnoreCase(HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_EXECUTION_ENGINE)) && Review Comment: Yes, running a rebalance compaction on uncompacted tables could be resource intensive due to the high number of files and folders. So I decided to schedule rebalance compactions only on tables already major compacted. This ensures that the number of deltas are relatively low. Issue Time Tracking ------------------- Worklog Id: (was: 837479) Time Spent: 1h 20m (was: 1h 10m) > Enable initiator to schedule rebalancing compactions > ---------------------------------------------------- > > Key: HIVE-26718 > URL: https://issues.apache.org/jira/browse/HIVE-26718 > Project: Hive > Issue Type: Sub-task > Components: Hive > Reporter: László Végh > Assignee: László Végh > Priority: Major > Labels: ACID, compaction, pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > Initiator should be able to schedule rebalancing compactions based on a set > of predefined and configurable thresholds. -- This message was sent by Atlassian Jira (v8.20.10#820010)