[ https://issues.apache.org/jira/browse/HIVE-15880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946351#comment-15946351 ]
Vihang Karajgaonkar commented on HIVE-15880: -------------------------------------------- Hi [~ctang.ma] Thanks for the review. I changed the implementation of {{HiveMetaStoreFsImpl.deleteDir}} so that it reuses FileUtils.moveToTrash method like you suggested. Also investigated encrypted tables, I realized that in case of encrypted table the {{INSERT OVERWRITE TABLE encrypted_table ...}} command was anyways ignoring the error thrown when old data was moved to Trash. It logs a warning and then deletes the data when moving to Trash to fails. With my patch there would be no behavioral change with respect to encrypted tables. With the patch, if the table is encrypted and auto.purge is true, then it doesn't attempt to move the data to trash in the first place, so the warning would not be logged as well. Similarly, for truncate table. Both the commands use FileUtils.moveToTrash in the backend to delete the data which catches the exception when table is encrypted and simply logs the warning. So with the patch both there would not be any change with respect to encrypted tables for both INSERT OVERWRITE and TRUNCATE commands since they were already ignoring the Trash. This would still be more efficient when the tables are located on S3 since the code does not attempt to move them to trash as well. Also, found another unrelated bug during this investigation regarding the inconsistency with respect to "EXTERNAL" property and EXTERNAL table. Created HIVE-16324 for that. Updating the patch which includes additional q file test and modifies the existing truncate table tests to work around HIVE-16324 > Allow insert overwrite and truncate table query to use auto.purge table > property > -------------------------------------------------------------------------------- > > Key: HIVE-15880 > URL: https://issues.apache.org/jira/browse/HIVE-15880 > Project: Hive > Issue Type: Improvement > Reporter: Vihang Karajgaonkar > Assignee: Vihang Karajgaonkar > Attachments: HIVE-15880.01.patch, HIVE-15880.02.patch, > HIVE-15880.03.patch, HIVE-15880.04.patch, HIVE-15880.05.patch > > > It seems inconsistent that auto.purge property is not considered when we do a > INSERT OVERWRITE while it is when we do a DROP TABLE > Drop table doesn't move table data to Trash when auto.purge is set to true > {noformat} > > create table temp(col1 string, col2 string); > No rows affected (0.064 seconds) > > alter table temp set tblproperties('auto.purge'='true'); > No rows affected (0.083 seconds) > > insert into temp values ('test', 'test'), ('test2', 'test2'); > No rows affected (25.473 seconds) > # hdfs dfs -ls /user/hive/warehouse/temp > Found 1 items > -rwxrwxrwt 3 hive hive 22 2017-02-09 13:03 > /user/hive/warehouse/temp/000000_0 > # > > drop table temp; > No rows affected (0.242 seconds) > # hdfs dfs -ls /user/hive/warehouse/temp > ls: `/user/hive/warehouse/temp': No such file or directory > # > # sudo -u hive hdfs dfs -ls /user/hive/.Trash/Current/user/hive/warehouse > # > {noformat} > INSERT OVERWRITE query moves the table data to Trash even when auto.purge is > set to true > {noformat} > > create table temp(col1 string, col2 string); > > alter table temp set tblproperties('auto.purge'='true'); > > insert into temp values ('test', 'test'), ('test2', 'test2'); > # hdfs dfs -ls /user/hive/warehouse/temp > Found 1 items > -rwxrwxrwt 3 hive hive 22 2017-02-09 13:07 > /user/hive/warehouse/temp/000000_0 > # > > insert overwrite table temp select * from dummy; > # hdfs dfs -ls /user/hive/warehouse/temp > Found 1 items > -rwxrwxrwt 3 hive hive 26 2017-02-09 13:08 > /user/hive/warehouse/temp/000000_0 > # sudo -u hive hdfs dfs -ls /user/hive/.Trash/Current/user/hive/warehouse > Found 1 items > drwx------ - hive hive 0 2017-02-09 13:08 > /user/hive/.Trash/Current/user/hive/warehouse/temp > # > {noformat} > While move operations are not very costly on HDFS it could be significant > overhead on slow FileSystems like S3. This could improve the performance of > {{INSERT OVERWRITE TABLE}} queries especially when there are large number of > partitions on tables located on S3 should the user wish to set auto.purge > property to true > Similarly {{TRUNCATE TABLE}} query on a table with {{auto.purge}} property > set true should not move the data to Trash -- This message was sent by Atlassian JIRA (v6.3.15#6346)