[ https://issues.apache.org/jira/browse/HIVE-25943?focusedWorklogId=734031&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734031 ]
ASF GitHub Bot logged work on HIVE-25943: ----------------------------------------- Author: ASF GitHub Bot Created on: 28/Feb/22 17:39 Start Date: 28/Feb/22 17:39 Worklog Time Spent: 10m Work Description: klcopp commented on a change in pull request #3034: URL: https://github.com/apache/hive/pull/3034#discussion_r816026374 ########## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionInfo.java ########## @@ -87,6 +89,21 @@ public CompactionInfo(String dbname, String tableName, String partName, Compacti } CompactionInfo() {} + public String getProperty(String key) { Review comment: Why use a map instead of just an integer? ########## File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ########## @@ -3318,6 +3322,10 @@ private static void populateLlapDaemonVarsSet(Set<String> llapDaemonVarsSetLocal HIVE_COMPACTOR_CLEANER_RETENTION_TIME("hive.compactor.cleaner.retention.time.seconds", "300s", new TimeValidator(TimeUnit.SECONDS), "Time to wait before cleanup of obsolete files/dirs after compaction. \n" + "This is the minimum amount of time the system will wait, since it will not clean before all open transactions are committed, that were opened before the compaction"), + HIVE_COMPACTOR_CLEANER_MAX_RETRY_ATTEMPTS("hive.compactor.cleaner.retry.maxattempts", 5, Review comment: It makes more sense to put this in MetastoreConf.java, since the Cleaner runs in HMS always. ########## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java ########## @@ -1430,6 +1427,44 @@ public void markRefused(CompactionInfo info) throws MetaException { updateStatus(info); } + + @Override + public void retryCleanerAttemptWithBackoff(CompactionInfo info, long retentionTime) throws MetaException { + try { + try (Connection dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED)) { + try (PreparedStatement stmt = dbConn.prepareStatement("UPDATE \"COMPACTION_QUEUE\" " + + "SET \"CQ_TBLPROPERTIES\" = ?, CQ_COMMIT_TIME = ?, CQ_ERROR_MESSAGE= ? " Review comment: CQ_TBLPROPERTIES is for setting TBLPROPERTIES (currently only for MR compaction) like this: ALTER TABLE table_name [PARTITION (partition_key = 'partition_value' [, ...])] COMPACT 'compaction_type' WITH OVERWRITE TBLPROPERTIES ("property"="value" [, ...]); I definitely wouldn't overwrite them for observability reasons. You could add to them, but probably the nicest solution would be to add a new column in the COMPACTION_QUEUE (unfortunately :)) ########## File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ########## @@ -3318,6 +3318,10 @@ private static void populateLlapDaemonVarsSet(Set<String> llapDaemonVarsSetLocal HIVE_COMPACTOR_CLEANER_RETENTION_TIME("hive.compactor.cleaner.retention.time.seconds", "300s", new TimeValidator(TimeUnit.SECONDS), "Time to wait before cleanup of obsolete files/dirs after compaction. \n" + "This is the minimum amount of time the system will wait, since it will not clean before all open transactions are committed, that were opened before the compaction"), + HIVE_COMPACTOR_CLEANER_MAX_RETRY_ATTEMPTS("hive.compactor.cleaner.retry.maxattempts", 5, + new RangeValidator(0, 10), "Maximum number of attempts to clean a table again after a " + + "failed cycle. The delay has a backoff, and calculated the following way: " + + "pow(2, number_of_failed_attempts) * HIVE_COMPACTOR_CLEANER_RETENTION_TIME. Must be between 0 and 10"), Review comment: I don't know, this seems pretty complicated for users to understand. They also might set hive.compactor.cleaner.retention.time.seconds to an excessively high number for some reason, without realizing that hive.compactor.cleaner.retry.maxattempts will be affected. I vote for simplifying this, like maybe try every 5 mins (configurable) ... what do you think? ########## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java ########## @@ -1430,6 +1427,44 @@ public void markRefused(CompactionInfo info) throws MetaException { updateStatus(info); } + + @Override + public void retryCleanerAttemptWithBackoff(CompactionInfo info, long retentionTime) throws MetaException { + try { + try (Connection dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED)) { + try (PreparedStatement stmt = dbConn.prepareStatement("UPDATE \"COMPACTION_QUEUE\" " + + "SET \"CQ_TBLPROPERTIES\" = ?, CQ_COMMIT_TIME = ?, CQ_ERROR_MESSAGE= ? " + + " WHERE \"CQ_ID\" = ?")) { + stmt.setString(1, info.properties); + stmt.setLong(2, retentionTime); Review comment: Also, sadly updating CQ_COMMIT_TIME messes with observability (oldest_ready_for_cleaning_age_in_sec) ########## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java ########## @@ -1430,6 +1427,44 @@ public void markRefused(CompactionInfo info) throws MetaException { updateStatus(info); } + + @Override Review comment: There's some RetrySemantics annotation that I don't understand needed here :D ########## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java ########## @@ -1430,6 +1427,44 @@ public void markRefused(CompactionInfo info) throws MetaException { updateStatus(info); } + + @Override + public void retryCleanerAttemptWithBackoff(CompactionInfo info, long retentionTime) throws MetaException { Review comment: The name sounds like this method is supposed to retry cleaning. Maybe setNextCleanerAttemptTime or something like that would be better? ########## File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ########## @@ -3318,6 +3322,10 @@ private static void populateLlapDaemonVarsSet(Set<String> llapDaemonVarsSetLocal HIVE_COMPACTOR_CLEANER_RETENTION_TIME("hive.compactor.cleaner.retention.time.seconds", "300s", new TimeValidator(TimeUnit.SECONDS), "Time to wait before cleanup of obsolete files/dirs after compaction. \n" + "This is the minimum amount of time the system will wait, since it will not clean before all open transactions are committed, that were opened before the compaction"), + HIVE_COMPACTOR_CLEANER_MAX_RETRY_ATTEMPTS("hive.compactor.cleaner.retry.maxattempts", 5, Review comment: Yeah, hive.compactor.cleaner.retention.time.seconds is here but it probably shouldn't be :) ########## File path: standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift ########## @@ -2956,6 +2956,7 @@ PartitionsResponse get_partitions_req(1:PartitionsRequest req) void mark_compacted(1: CompactionInfoStruct cr) throws(1:MetaException o1) void mark_failed(1: CompactionInfoStruct cr) throws(1:MetaException o1) void mark_refused(1: CompactionInfoStruct cr) throws(1:MetaException o1) + void retry_cleaner_attempt_with_backoff(1: CompactionInfoStruct cr, 2:i64 retentionTime) throws(1:MetaException o1) Review comment: I don't think this is necessary, since the Cleaner runs only in HMS and can communicate directly with the CompactionTxnHandler. ########## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java ########## @@ -522,6 +522,16 @@ void onRename(String oldCatName, String oldDbName, String oldTabName, String old */ void markRefused(CompactionInfo info) throws MetaException; + /** + * Updates the cleaner retry time related information (compaction properties and commit time) of the CompactionInfo + * in the HMS database. + * @param info The {@link CompactionInfo} object to be updated. + * @param retentionTime The time until the clean won't be attempted again. Review comment: The time when cleaning will be reattempted? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 734031) Time Spent: 3h 50m (was: 3h 40m) > Introduce compaction cleaner failed attempts threshold > ------------------------------------------------------ > > Key: HIVE-25943 > URL: https://issues.apache.org/jira/browse/HIVE-25943 > Project: Hive > Issue Type: Improvement > Components: Hive > Reporter: László Végh > Assignee: László Végh > Priority: Major > Labels: pull-request-available > Time Spent: 3h 50m > Remaining Estimate: 0h > > If the cleaner fails for some reason, the compaction entity status remains in > "ready for cleaning", therefore the cleaner will pick up this entity > resulting in an endless try. The number of failed cleaning attempts should be > counted and if they reach a certain threshold the cleaner must skip all the > cleaning attempts on that compaction entity. -- This message was sent by Atlassian Jira (v8.20.1#820001)