[jira] [Work logged] (HIVE-25943) Introduce compaction cleaner failed attempts threshold

ASF GitHub Bot (Jira) Mon, 28 Feb 2022 09:40:06 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-25943?focusedWorklogId=734031&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734031
 ]


ASF GitHub Bot logged work on HIVE-25943:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 28/Feb/22 17:39
            Start Date: 28/Feb/22 17:39
    Worklog Time Spent: 10m 
      Work Description: klcopp commented on a change in pull request #3034:
URL: https://github.com/apache/hive/pull/3034#discussion_r816026374



##########
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionInfo.java
##########
@@ -87,6 +89,21 @@ public CompactionInfo(String dbname, String tableName, 
String partName, Compacti
   }
   CompactionInfo() {}
 
+  public String getProperty(String key) {

Review comment:
       Why use a map instead of just an integer?

##########
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##########
@@ -3318,6 +3322,10 @@ private static void 
populateLlapDaemonVarsSet(Set<String> llapDaemonVarsSetLocal
     
HIVE_COMPACTOR_CLEANER_RETENTION_TIME("hive.compactor.cleaner.retention.time.seconds",
 "300s",
         new TimeValidator(TimeUnit.SECONDS), "Time to wait before cleanup of 
obsolete files/dirs after compaction. \n"
         + "This is the minimum amount of time the system will wait, since it 
will not clean before all open transactions are committed, that were opened 
before the compaction"),
+    
HIVE_COMPACTOR_CLEANER_MAX_RETRY_ATTEMPTS("hive.compactor.cleaner.retry.maxattempts",
 5,

Review comment:
       It makes more sense to put this in MetastoreConf.java, since the Cleaner 
runs in HMS always.

##########
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
##########
@@ -1430,6 +1427,44 @@ public void markRefused(CompactionInfo info) throws 
MetaException {
     updateStatus(info);
   }
 
+
+  @Override
+  public void retryCleanerAttemptWithBackoff(CompactionInfo info, long 
retentionTime) throws MetaException {
+    try {
+      try (Connection dbConn = 
getDbConn(Connection.TRANSACTION_READ_COMMITTED)) {
+        try (PreparedStatement stmt = dbConn.prepareStatement("UPDATE 
\"COMPACTION_QUEUE\" " +
+                "SET \"CQ_TBLPROPERTIES\" = ?, CQ_COMMIT_TIME = ?, 
CQ_ERROR_MESSAGE= ? "

Review comment:
       CQ_TBLPROPERTIES is for setting TBLPROPERTIES (currently only for MR 
compaction) like this:
   
   ALTER TABLE table_name [PARTITION (partition_key = 'partition_value' [, 
...])]  COMPACT 'compaction_type' WITH OVERWRITE TBLPROPERTIES 
("property"="value" [, ...]);
   
   I definitely wouldn't overwrite them for observability reasons. You could 
add to them, but probably the nicest solution would be to add a new column in 
the COMPACTION_QUEUE (unfortunately :))

##########
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##########
@@ -3318,6 +3318,10 @@ private static void 
populateLlapDaemonVarsSet(Set<String> llapDaemonVarsSetLocal
     
HIVE_COMPACTOR_CLEANER_RETENTION_TIME("hive.compactor.cleaner.retention.time.seconds",
 "300s",
         new TimeValidator(TimeUnit.SECONDS), "Time to wait before cleanup of 
obsolete files/dirs after compaction. \n"
         + "This is the minimum amount of time the system will wait, since it 
will not clean before all open transactions are committed, that were opened 
before the compaction"),
+    
HIVE_COMPACTOR_CLEANER_MAX_RETRY_ATTEMPTS("hive.compactor.cleaner.retry.maxattempts",
 5,
+        new RangeValidator(0, 10), "Maximum number of attempts to clean a 
table again after a " +
+            "failed cycle. The delay has a backoff, and calculated the 
following way: " +
+            "pow(2, number_of_failed_attempts) * 
HIVE_COMPACTOR_CLEANER_RETENTION_TIME. Must be between 0 and 10"),

Review comment:
       I don't know, this seems pretty complicated for users to understand. 
They also might set hive.compactor.cleaner.retention.time.seconds to an 
excessively high number for some reason, without realizing that 
hive.compactor.cleaner.retry.maxattempts will be affected. I vote for 
simplifying this, like maybe try every 5 mins (configurable) ... what do you 
think?

##########
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
##########
@@ -1430,6 +1427,44 @@ public void markRefused(CompactionInfo info) throws 
MetaException {
     updateStatus(info);
   }
 
+
+  @Override
+  public void retryCleanerAttemptWithBackoff(CompactionInfo info, long 
retentionTime) throws MetaException {
+    try {
+      try (Connection dbConn = 
getDbConn(Connection.TRANSACTION_READ_COMMITTED)) {
+        try (PreparedStatement stmt = dbConn.prepareStatement("UPDATE 
\"COMPACTION_QUEUE\" " +
+                "SET \"CQ_TBLPROPERTIES\" = ?, CQ_COMMIT_TIME = ?, 
CQ_ERROR_MESSAGE= ? "
+                + " WHERE \"CQ_ID\" = ?")) {
+          stmt.setString(1, info.properties);
+          stmt.setLong(2, retentionTime);

Review comment:
       Also, sadly updating CQ_COMMIT_TIME messes with observability 
(oldest_ready_for_cleaning_age_in_sec)

##########
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
##########
@@ -1430,6 +1427,44 @@ public void markRefused(CompactionInfo info) throws 
MetaException {
     updateStatus(info);
   }
 
+
+  @Override

Review comment:
       There's some RetrySemantics annotation that I don't understand needed 
here :D 

##########
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
##########
@@ -1430,6 +1427,44 @@ public void markRefused(CompactionInfo info) throws 
MetaException {
     updateStatus(info);
   }
 
+
+  @Override
+  public void retryCleanerAttemptWithBackoff(CompactionInfo info, long 
retentionTime) throws MetaException {

Review comment:
       The name sounds like this method is supposed to retry cleaning. Maybe 
setNextCleanerAttemptTime or something like that would be better?

##########
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##########
@@ -3318,6 +3322,10 @@ private static void 
populateLlapDaemonVarsSet(Set<String> llapDaemonVarsSetLocal
     
HIVE_COMPACTOR_CLEANER_RETENTION_TIME("hive.compactor.cleaner.retention.time.seconds",
 "300s",
         new TimeValidator(TimeUnit.SECONDS), "Time to wait before cleanup of 
obsolete files/dirs after compaction. \n"
         + "This is the minimum amount of time the system will wait, since it 
will not clean before all open transactions are committed, that were opened 
before the compaction"),
+    
HIVE_COMPACTOR_CLEANER_MAX_RETRY_ATTEMPTS("hive.compactor.cleaner.retry.maxattempts",
 5,

Review comment:
       Yeah, hive.compactor.cleaner.retention.time.seconds is here but it 
probably shouldn't be :)

##########
File path: 
standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##########
@@ -2956,6 +2956,7 @@ PartitionsResponse get_partitions_req(1:PartitionsRequest 
req)
   void mark_compacted(1: CompactionInfoStruct cr) throws(1:MetaException o1)
   void mark_failed(1: CompactionInfoStruct cr) throws(1:MetaException o1)
   void mark_refused(1: CompactionInfoStruct cr) throws(1:MetaException o1)
+  void retry_cleaner_attempt_with_backoff(1: CompactionInfoStruct cr, 2:i64 
retentionTime) throws(1:MetaException o1)

Review comment:
       I don't think this is necessary, since the Cleaner runs only in HMS and 
can communicate directly with the CompactionTxnHandler.

##########
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java
##########
@@ -522,6 +522,16 @@ void onRename(String oldCatName, String oldDbName, String 
oldTabName, String old
    */
   void markRefused(CompactionInfo info) throws MetaException;
 
+  /**
+   * Updates the cleaner retry time related information (compaction properties 
and commit time) of the CompactionInfo
+   * in the HMS database.
+   * @param info The {@link CompactionInfo} object to be updated.
+   * @param retentionTime The time until the clean won't be attempted again.

Review comment:
       The time when cleaning will be reattempted?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 734031)
    Time Spent: 3h 50m  (was: 3h 40m)

> Introduce compaction cleaner failed attempts threshold
> ------------------------------------------------------
>
>                 Key: HIVE-25943
>                 URL: https://issues.apache.org/jira/browse/HIVE-25943
>             Project: Hive
>          Issue Type: Improvement
>          Components: Hive
>            Reporter: László Végh
>            Assignee: László Végh
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> If the cleaner fails for some reason, the compaction entity status remains in 
> "ready for cleaning", therefore the cleaner will pick up this entity 
> resulting in an endless try. The number of failed cleaning attempts should be 
> counted and if they reach a certain threshold the cleaner must skip all the 
> cleaning attempts on that compaction entity. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25943) Introduce compaction cleaner failed attempts threshold

Reply via email to