wangzhihao created HIVE-18927:
---------------------------------
Summary: Hive "insert overwrite" doesn't replace the destination
files if no partition in metastore for the files
Key: HIVE-18927
URL: https://issues.apache.org/jira/browse/HIVE-18927
Project: Hive
Issue Type: Bug
Components: Hive
Reporter: wangzhihao
[This
post|http://www.ericlin.me/2015/05/hive-insert-overwrite-does-not-remove-existing-data/]
describe a way to produce this issue:
{noformat}
# Add some files into file system but no partition in metastore to track it.
hdfs dfs -put test.txt test/p=p1
# Insert overwrite the partition(p = p1)
DROP TABLE IF EXISTS partition_test;
CREATE EXTERNAL TABLE partition_test (a int) PARTITIONED BY (p string);
INSERT OVERWRITE TABLE partition_test PARTITION (p = 'p1') SELECT 123;
# verify the text.txt is not removed.
hdfs dfs -ls test/p=p1
Found 2 items
-rwxr-xr-x 3 hdfs supergroup 194965 2015-05-05 00:15 test/p=p1/000000_0
-rw-r--r-- 3 hdfs supergroup 8 2015-05-05 00:10 test/p=p1/test.txt
{noformat}
The reason is that
[Hive.loadPartition|https://github.com/apache/hive/blob/9b36ffa92cc4e0f47ea03d8d167debe743342f5b/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L1652]
will try to {{replaceFiles}} only if {{oldPath}} exists. Since metastore have
no partition for the files, the {{oldPath}} is null and thus the files get no
chance to be cleaned. We should also clean {{destf}} in method
[Hive.replaceFiles|https://github.com/apache/hive/blob/b362de3871764731d8371657b07140e37a3c5105/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L3817]
to fix the issue.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)