[ 
https://issues.apache.org/jira/browse/HIVE-25535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Sharma updated HIVE-25535:
---------------------------------
    Description: 
Use Case - 

When external tool like SPARK_ACID try to access hive metastore directly 
instead of accessing LLAP or hs2 which lacks the ability of take aquires locks 
on the metastore artifacts. Due to which if any spark acid jobs starts and at 
the same time compaction happens in hive with leads to exceptions like 
FileNotFound for delta directory because at time of spark acid complitation 
phase delta files are present but when execution start delta files are deleted 
by compactor. 

Inorder to tackle problem like this I am proposing to add a config "NO_CLEANUP" 
is table properties and partition properties which provide higher control on 
table and partition compaction process. 

We already have 
"[HIVE_COMPACTOR_DELAYED_CLEANUP_ENABLED|https://github.com/apache/hive/blob/71583e322fe14a0cfcde639629b509b252b0ed2c/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L3243]";
 which allow us to delay the deletion of "obsolete directories/files" but it is 
applicable to all the table in metastore where this config will provide table 
and partition level control.

Solution - 

Add "NO_CLEANUP" in the table properties enable/disable the table-level and 
partition cleanup and prevent the cleaner process from automatically cleaning 
obsolete directories/files.

Example - 

ALTER TABLE <tablename> SET TBLPROPERTIES('NO_CLEANUP'=FALSE/TRUE);

  was:
Use Case - 

When external tool like SPARK_ACID try to access hive metastore directly 
instead of accessing LLAP or hs2 which lacks the ability of take aquires locks 
on the metastore artifacts. Due to which if any spark acid jobs starts and at 
the same time compaction happens in hive with leads to exceptions like 
FileNotFound for delta directory because at time of spark acid complitation 
phase delta files are present but when execution start delta files are deleted 
by compactor. 

Inorder to tackle problem like this I am proposing to add a config "NO_CLEANUP" 
is table properties and partition properties which provide higher control on 
table and partition compaction process. 

We already have "HIVE_COMPACTOR_DELAYED_CLEANUP_ENABLED" which allow us to 
delay the deletion of "obsolete directories/files" but it is applicable to all 
the table in metastore where this config will provide table and partition level 
control.

Solution - 

Add "NO_CLEANUP" in the table properties enable/disable the table-level and 
partition cleanup and prevent the cleaner process from automatically cleaning 
obsolete directories/files.

Example - 

ALTER TABLE <tablename> SET TBLPROPERTIES('NO_CLEANUP'=FALSE/TRUE);


> Control cleaning obsolete directories/files of a table via property
> -------------------------------------------------------------------
>
>                 Key: HIVE-25535
>                 URL: https://issues.apache.org/jira/browse/HIVE-25535
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Ashish Sharma
>            Assignee: Ashish Sharma
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Use Case - 
> When external tool like SPARK_ACID try to access hive metastore directly 
> instead of accessing LLAP or hs2 which lacks the ability of take aquires 
> locks on the metastore artifacts. Due to which if any spark acid jobs starts 
> and at the same time compaction happens in hive with leads to exceptions like 
> FileNotFound for delta directory because at time of spark acid complitation 
> phase delta files are present but when execution start delta files are 
> deleted by compactor. 
> Inorder to tackle problem like this I am proposing to add a config 
> "NO_CLEANUP" is table properties and partition properties which provide 
> higher control on table and partition compaction process. 
> We already have 
> "[HIVE_COMPACTOR_DELAYED_CLEANUP_ENABLED|https://github.com/apache/hive/blob/71583e322fe14a0cfcde639629b509b252b0ed2c/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L3243]";
>  which allow us to delay the deletion of "obsolete directories/files" but it 
> is applicable to all the table in metastore where this config will provide 
> table and partition level control.
> Solution - 
> Add "NO_CLEANUP" in the table properties enable/disable the table-level and 
> partition cleanup and prevent the cleaner process from automatically cleaning 
> obsolete directories/files.
> Example - 
> ALTER TABLE <tablename> SET TBLPROPERTIES('NO_CLEANUP'=FALSE/TRUE);



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to