Pravin created HIVE-28945:
-----------------------------
Summary: Data loss observed during INSERT OVERWRITE from one table
to another with identical schema, involving both internal and external tables.
Key: HIVE-28945
URL: https://issues.apache.org/jira/browse/HIVE-28945
Project: Hive
Issue Type: Bug
Components: HiveServer2
Affects Versions: 4.0.1
Reporter: Pravin
We encountered an inconsistent issue while performing an {{INSERT OVERWRITE}}
operation between two Hive tables with identical schemas.
* The source table, {{{}account_data{}}}, is an *external table* containing
*954 columns* and approximately {*}10,000 rows{*}.
* A target table, {{{}account_data_temp{}}}, was created using the {{LIKE}}
clause to mirror the schema of {{{}account_data{}}}.
* {{account_data_temp}} is also an {*}external table{*}, created using the
following statement:
CREATE EXTERNAL TABLE account_data_temp
LIKE account_data
LOCATION 'hdfs://clustor1/user/account/account_data_temp';
The data transfer was performed using the following {{INSERT OVERWRITE}} query:
INSERT OVERWRITE TABLE default.account_data_temp
SELECT * FROM default.account_data;
After executing the above query, we observed that *3 rows were missing* in the
target table ({{{}account_data_temp{}}}). A similar issue was noticed when
inserting data from an *internal table to an external table* as well.
*Key Observations:*
* This issue is *not consistently reproducible* — it occurs intermittently.
* The row count mismatch suggests *possible silent data loss* during the
{{INSERT OVERWRITE}} operation.
* No errors or warnings were reported during query execution.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)