Pravin created HIVE-28945:
-----------------------------

             Summary: Data loss observed during INSERT OVERWRITE from one table 
to another with identical schema, involving both internal and external tables.
                 Key: HIVE-28945
                 URL: https://issues.apache.org/jira/browse/HIVE-28945
             Project: Hive
          Issue Type: Bug
          Components: HiveServer2
    Affects Versions: 4.0.1
            Reporter: Pravin


We encountered an inconsistent issue while performing an {{INSERT OVERWRITE}} 
operation between two Hive tables with identical schemas.
 * The source table, {{{}account_data{}}}, is an *external table* containing 
*954 columns* and approximately {*}10,000 rows{*}.

 * A target table, {{{}account_data_temp{}}}, was created using the {{LIKE}} 
clause to mirror the schema of {{{}account_data{}}}.

 * {{account_data_temp}} is also an {*}external table{*}, created using the 
following statement:

CREATE EXTERNAL TABLE account_data_temp
LIKE account_data
LOCATION 'hdfs://clustor1/user/account/account_data_temp';

 

The data transfer was performed using the following {{INSERT OVERWRITE}} query:

 

INSERT OVERWRITE TABLE default.account_data_temp 
SELECT * FROM default.account_data;

 

After executing the above query, we observed that *3 rows were missing* in the 
target table ({{{}account_data_temp{}}}). A similar issue was noticed when 
inserting data from an *internal table to an external table* as well.

 

*Key Observations:*
 * This issue is *not consistently reproducible* — it occurs intermittently.

 * The row count mismatch suggests *possible silent data loss* during the 
{{INSERT OVERWRITE}} operation.

 * No errors or warnings were reported during query execution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to