J.P Feng created HIVE-20594:
-------------------------------

             Summary: insert overwrite may brings duplicated data when hdfs 
path exists but partition missing in hms
                 Key: HIVE-20594
                 URL: https://issues.apache.org/jira/browse/HIVE-20594
             Project: Hive
          Issue Type: Bug
          Components: Hive
    Affects Versions: 2.1.1
            Reporter: J.P Feng


when i insert overwrite a partitioned table whose hdfs path exists but its 
partition is missing from hms, i will get the duplicated data.

 

sql: insert overwrite table hive_test.temp_fcs2_inv_trx_settle_intf_out_all_2ns 
partition (month = '201808' ) select * from xxx where month = '201808';

 

1. there is 10 files in hive_test.temp_fcs2_inv_trx_settle_intf_out_all_2ns

    month=201808/000001_0

    month=201808/000002_0 ... month=201808/000009_0

2. if hive_test.temp_fcs2_inv_trx_settle_intf_out_all_2ns is a external table 
and i drop partition (month=201808) / or in other ways, i drop partition 
(month=201808) but do not remove the data under it

3.insert overwrite table hive_test.temp_fcs2_inv_trx_settle_intf_out_all_2ns 
partition (month = '201808' ) select * from xxx where month = '201808' 

if in such sql, it generates 9 maps, and may generates 9 files :

month=201808/000001_0 ~ month=201808/000008_0

 

after executing such sql, we may find the file `month=201808/000009_0` will 
still remain, then we may get the duplicated data.

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to