[ 
https://issues.apache.org/jira/browse/HIVE-25535?focusedWorklogId=652946&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-652946
 ]

ASF GitHub Bot logged work on HIVE-25535:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 20/Sep/21 13:09
            Start Date: 20/Sep/21 13:09
    Worklog Time Spent: 10m 
      Work Description: sankarh commented on a change in pull request #2651:
URL: https://github.com/apache/hive/pull/2651#discussion_r712136132



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##########
@@ -179,6 +180,13 @@ private void clean(CompactionInfo ci, long minOpenTxnGLB, 
boolean metricsEnabled
         txnHandler.markCleaned(ci);
         return;
       }
+      if (MetaStoreUtils.isNoCleanUpSet(t.getParameters())) {
+        // The table was marked no clean up true.
+        LOG.info("Skipping " + ci.getFullTableName() + " clean up, as 
NO_CLEANUP set to true");
+        txnHandler.markCleaned(ci);

Review comment:
       We shouldn't mark these table/partition as cleaned since we skip it 
temporarily. If user enable it back, then next cycle should clean the obsolete 
files.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##########
@@ -189,6 +197,12 @@ private void clean(CompactionInfo ci, long minOpenTxnGLB, 
boolean metricsEnabled
           txnHandler.markCleaned(ci);
           return;
         }
+        if(MetaStoreUtils.isNoCleanUpSet(p.getParameters())){

Review comment:
       nit: Add space before ( and {

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##########
@@ -189,6 +197,12 @@ private void clean(CompactionInfo ci, long minOpenTxnGLB, 
boolean metricsEnabled
           txnHandler.markCleaned(ci);
           return;
         }
+        if(MetaStoreUtils.isNoCleanUpSet(p.getParameters())){
+          // The table was marked no clean up true.

Review comment:
       Update the comment: "The partition is marked with no_cleanup=true"

##########
File path: ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestCleaner.java
##########
@@ -604,4 +608,113 @@ public void tearDown() throws Exception {
     compactorTestCleanup();
   }
 
+  @Test
+  public void NoCleanupAfterMajorCompaction() throws Exception {
+    Map<String, String> parameters = new HashMap<>();
+
+    //With no cleanup true
+    parameters.put("no_cleanup", "true");
+    Table t = newTable("default", "dcamc", false, parameters);
+
+    addBaseFile(t, null, 20L, 20);
+    addDeltaFile(t, null, 21L, 22L, 2);
+    addDeltaFile(t, null, 23L, 24L, 2);
+    addBaseFile(t, null, 25L, 25);
+
+    burnThroughTransactions("default", "dcamc", 25);
+
+    CompactionRequest rqst = new CompactionRequest("default", "dcamc", 
CompactionType.MAJOR);
+    compactInTxn(rqst);
+
+    startCleaner();
+    // Check there are no compactions requests left.
+    ShowCompactResponse rsp = txnHandler.showCompact(new ShowCompactRequest());
+    Assert.assertEquals(1, rsp.getCompactsSize());
+    Assert.assertEquals(TxnStore.SUCCEEDED_RESPONSE, 
rsp.getCompacts().get(0).getState());
+
+    // Check that the files are not removed
+    List<Path> paths = getDirectories(conf, t, null);
+    Assert.assertEquals(4, paths.size());
+
+    //With no clean up false
+    t = ms.getTable(new GetTableRequest("default", "dcamc"));
+    t.getParameters().put("no_cleanup", "false");
+    ms.alter_table("default", "dcamc", t);
+    rqst = new CompactionRequest("default", "dcamc", CompactionType.MAJOR);
+    compactInTxn(rqst);
+
+    startCleaner();
+    // Check there are no compactions requests left.
+    rsp = txnHandler.showCompact(new ShowCompactRequest());
+    Assert.assertEquals(2, rsp.getCompactsSize());
+    Assert.assertEquals(TxnStore.SUCCEEDED_RESPONSE, 
rsp.getCompacts().get(0).getState());
+
+    // Check that the files are not removed
+    paths = getDirectories(conf, t, null);
+    Assert.assertEquals(1, paths.size());
+    Assert.assertEquals("base_25", paths.get(0).getName());
+  }

Review comment:
       Enable cleanup again and run cleaner which should remove the files.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 652946)
    Time Spent: 50m  (was: 40m)

> Control cleaning obsolete directories/files of a table via property
> -------------------------------------------------------------------
>
>                 Key: HIVE-25535
>                 URL: https://issues.apache.org/jira/browse/HIVE-25535
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Ashish Sharma
>            Assignee: Ashish Sharma
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> *Use Case* - 
> When external tool like [SPARK_ACID |https://github.com/qubole/spark-acid]try 
> to access hive metastore directly instead of accessing LLAP or hs2 which 
> lacks the ability of take acquires locks on the metastore artefacts. Due to 
> which if any spark acid jobs starts and at the same time compaction happens 
> in hive with leads to exceptions like *FileNotFound* for delta directory 
> because at time of spark acid compilation phase delta files are present but 
> when execution start delta files are deleted by compactor. 
> Inorder to tackle problem like this I am proposing to add a config 
> "NO_CLEANUP" is table properties and partition properties which provide 
> higher control on table and partition compaction process. 
> We already have 
> "[HIVE_COMPACTOR_DELAYED_CLEANUP_ENABLED|https://github.com/apache/hive/blob/71583e322fe14a0cfcde639629b509b252b0ed2c/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L3243]";
>  which allow us to delay the deletion of "obsolete directories/files" but it 
> is applicable to all the table in metastore where this config will provide 
> table and partition level control.
> *Solution* - 
> Add "NO_CLEANUP" in the table properties enable/disable the table-level and 
> partition cleanup and prevent the cleaner process from automatically cleaning 
> obsolete directories/files.
> Example - 
> ALTER TABLE <tablename> SET TBLPROPERTIES('NO_CLEANUP'=FALSE/TRUE);



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to