[ https://issues.apache.org/jira/browse/HIVE-22077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hui An updated HIVE-22077: -------------------------- Description: Inserting overwrite static partitions may not clean related HDFS location if partitions' info is not stored in metadata. Steps to reproduce this issue : ------------------------------------------------ 1. Create a managed table : ------------------------------------------------ {code:sql} CREATE TABLE `test`( `id` string) PARTITIONED BY ( `dayno` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION | 'hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test' TBLPROPERTIES ( 'transient_lastDdlTime'='1564731656') {code} ------------------------------------------------ 2. Create partition's directory and put some data in it ------------------------------------------------ {code:java} hdfs dfs -mkdir hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802 hdfs dfs -put test.data hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802 {code} ------------------------------------------------ 3. Insert overwrite partition dayno=20190802 ------------------------------------------------ {code:sql} INSERT OVERWRITE TABLE test PARTITION(dayno='20190802') SELECT "some value"; {code} ------------------------------------------------ 4. We could see the test.data under partition directory is not deleted. ------------------------------------------------ was: Inserting overwrite static partitions may not clean related HDFS location if partitions' info is not stored in metadata. Steps to Reproduce this issue : ------------------------------------------------ 1. Create a managed table : ------------------------------------------------ {code:sql} CREATE TABLE `test`( `id` string) PARTITIONED BY ( `dayno` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION | 'hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test' TBLPROPERTIES ( 'transient_lastDdlTime'='1564731656') {code} ------------------------------------------------ 2. Create partition's directory and put some data under it ------------------------------------------------ {code:java} hdfs dfs -mkdir hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802 hdfs dfs -put test.data hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802 {code} ------------------------------------------------ 3. Insert overwrite partition dayno=20190802 ------------------------------------------------ {code:sql} INSERT OVERWRITE TABLE test PARTITION(dayno='20190802') SELECT "some value"; {code} ------------------------------------------------ 4. We could see the test.data under partition directory is not deleted. ------------------------------------------------ > Inserting overwrite partitions clause does not clean directories while > partitions' info is not stored in metadata > ----------------------------------------------------------------------------------------------------------------- > > Key: HIVE-22077 > URL: https://issues.apache.org/jira/browse/HIVE-22077 > Project: Hive > Issue Type: Bug > Components: Hive > Affects Versions: 1.1.1, 4.0.0, 2.3.4 > Reporter: Hui An > Assignee: Hui An > Priority: Major > > Inserting overwrite static partitions may not clean related HDFS location if > partitions' info is not stored in metadata. > Steps to reproduce this issue : > ------------------------------------------------ > 1. Create a managed table : > ------------------------------------------------ > {code:sql} > CREATE TABLE `test`( > `id` string) > PARTITIONED BY ( > `dayno` string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' > LOCATION | > 'hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test' > TBLPROPERTIES ( > 'transient_lastDdlTime'='1564731656') > {code} > ------------------------------------------------ > 2. Create partition's directory and put some data in it > ------------------------------------------------ > {code:java} > hdfs dfs -mkdir > hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802 > hdfs dfs -put test.data > hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802 > {code} > ------------------------------------------------ > 3. Insert overwrite partition dayno=20190802 > ------------------------------------------------ > {code:sql} > INSERT OVERWRITE TABLE test PARTITION(dayno='20190802') > SELECT "some value"; > {code} > ------------------------------------------------ > 4. We could see the test.data under partition directory is not deleted. > ------------------------------------------------ -- This message was sent by Atlassian JIRA (v7.6.14#76016)