[ 
https://issues.apache.org/jira/browse/HIVE-15880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946351#comment-15946351
 ] 

Vihang Karajgaonkar commented on HIVE-15880:
--------------------------------------------

Hi [~ctang.ma] Thanks for the review. I changed the implementation of 
{{HiveMetaStoreFsImpl.deleteDir}} so that it reuses FileUtils.moveToTrash 
method like you suggested.

Also investigated encrypted tables, I realized that in case of encrypted table 
the {{INSERT OVERWRITE TABLE encrypted_table ...}} command was anyways ignoring 
the error thrown when old data was moved to Trash. It logs a warning and then 
deletes the data when moving to Trash to fails. With my patch there would be no 
behavioral change with respect to encrypted tables. With the patch, if the 
table is encrypted and auto.purge is true, then it doesn't attempt to move the 
data to trash in the first place, so the warning would not be logged as well.

Similarly, for truncate table. Both the commands use FileUtils.moveToTrash in 
the backend to delete the data which catches the exception when table is 
encrypted and simply logs the warning. So with the patch both there would not 
be any change with respect to encrypted tables for both INSERT OVERWRITE and 
TRUNCATE commands since they were already ignoring the Trash.

This would still be more efficient when the tables are located on S3 since the 
code does not attempt to move them to trash as well.

Also, found another unrelated bug during this investigation regarding the 
inconsistency with respect to "EXTERNAL" property and EXTERNAL table. Created 
HIVE-16324 for that.

Updating the patch which includes additional q file test and modifies the 
existing truncate table tests to work around HIVE-16324

> Allow insert overwrite and truncate table query to use auto.purge table 
> property
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-15880
>                 URL: https://issues.apache.org/jira/browse/HIVE-15880
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Vihang Karajgaonkar
>            Assignee: Vihang Karajgaonkar
>         Attachments: HIVE-15880.01.patch, HIVE-15880.02.patch, 
> HIVE-15880.03.patch, HIVE-15880.04.patch, HIVE-15880.05.patch
>
>
> It seems inconsistent that auto.purge property is not considered when we do a 
> INSERT OVERWRITE while it is when we do a DROP TABLE
> Drop table doesn't move table data to Trash when auto.purge is set to true
> {noformat}
> > create table temp(col1 string, col2 string);
> No rows affected (0.064 seconds)
> > alter table temp set tblproperties('auto.purge'='true');
> No rows affected (0.083 seconds)
> > insert into temp values ('test', 'test'), ('test2', 'test2');
> No rows affected (25.473 seconds)
> # hdfs dfs -ls /user/hive/warehouse/temp
> Found 1 items
> -rwxrwxrwt   3 hive hive         22 2017-02-09 13:03 
> /user/hive/warehouse/temp/000000_0
> #
> > drop table temp;
> No rows affected (0.242 seconds)
> # hdfs dfs -ls /user/hive/warehouse/temp
> ls: `/user/hive/warehouse/temp': No such file or directory
> #
> # sudo -u hive hdfs dfs -ls /user/hive/.Trash/Current/user/hive/warehouse
> #
> {noformat}
> INSERT OVERWRITE query moves the table data to Trash even when auto.purge is 
> set to true
> {noformat}
> > create table temp(col1 string, col2 string);
> > alter table temp set tblproperties('auto.purge'='true');
> > insert into temp values ('test', 'test'), ('test2', 'test2');
> # hdfs dfs -ls /user/hive/warehouse/temp
> Found 1 items
> -rwxrwxrwt   3 hive hive         22 2017-02-09 13:07 
> /user/hive/warehouse/temp/000000_0
> #
> > insert overwrite table temp select * from dummy;
> # hdfs dfs -ls /user/hive/warehouse/temp
> Found 1 items
> -rwxrwxrwt   3 hive hive         26 2017-02-09 13:08 
> /user/hive/warehouse/temp/000000_0
> # sudo -u hive hdfs dfs -ls /user/hive/.Trash/Current/user/hive/warehouse
> Found 1 items
> drwx------   - hive hive          0 2017-02-09 13:08 
> /user/hive/.Trash/Current/user/hive/warehouse/temp
> #
> {noformat}
> While move operations are not very costly on HDFS it could be significant 
> overhead on slow FileSystems like S3. This could improve the performance of 
> {{INSERT OVERWRITE TABLE}} queries especially when there are large number of 
> partitions on tables located on S3 should the user wish to set auto.purge 
> property to true
> Similarly {{TRUNCATE TABLE}} query on a table with {{auto.purge}} property 
> set true should not move the data to Trash



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to