[
https://issues.apache.org/jira/browse/HIVE-28945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pravin updated HIVE-28945:
--------------------------
Description:
We encountered an inconsistent issue while performing an {{INSERT OVERWRITE}}
operation between two Hive tables with identical schemas.
* The source table, {{{}account_data{}}}, is an *external table* containing
*954 columns* and approximately {*}10,000 rows{*}.
* A target table, {{{}account_data_temp{}}}, was created using the {{LIKE}}
clause to mirror the schema of {{{}account_data{}}}.
* {{account_data_temp}} is also an {*}external table{*}, created using the
following statement:
CREATE EXTERNAL TABLE account_data_temp
LIKE account_data
LOCATION 'hdfs://clustor1/user/account/account_data_temp';
The data transfer was performed using the following {{INSERT OVERWRITE}} query:
INSERT OVERWRITE TABLE default.account_data_temp
SELECT * FROM default.account_data;
After executing the above query, we observed that few *rows were missing* in
the target table ({{{}account_data_temp{}}}). A similar issue was noticed when
inserting data from an *internal table to an external table* as well.
*Key Observations:*
* This issue is *not consistently reproducible* — it occurs intermittently.
* The row count mismatch suggests *possible silent data loss* during the
{{INSERT OVERWRITE}} operation.
* No errors or warnings were reported during query execution.
was:
We encountered an inconsistent issue while performing an {{INSERT OVERWRITE}}
operation between two Hive tables with identical schemas.
* The source table, {{{}account_data{}}}, is an *external table* containing
*954 columns* and approximately {*}10,000 rows{*}.
* A target table, {{{}account_data_temp{}}}, was created using the {{LIKE}}
clause to mirror the schema of {{{}account_data{}}}.
* {{account_data_temp}} is also an {*}external table{*}, created using the
following statement:
CREATE EXTERNAL TABLE account_data_temp
LIKE account_data
LOCATION 'hdfs://clustor1/user/account/account_data_temp';
The data transfer was performed using the following {{INSERT OVERWRITE}} query:
INSERT OVERWRITE TABLE default.account_data_temp
SELECT * FROM default.account_data;
After executing the above query, we observed that *3 rows were missing* in the
target table ({{{}account_data_temp{}}}). A similar issue was noticed when
inserting data from an *internal table to an external table* as well.
*Key Observations:*
* This issue is *not consistently reproducible* — it occurs intermittently.
* The row count mismatch suggests *possible silent data loss* during the
{{INSERT OVERWRITE}} operation.
* No errors or warnings were reported during query execution.
> Data loss observed during INSERT OVERWRITE from one table to another with
> identical schema, involving both internal and external tables.
> ----------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-28945
> URL: https://issues.apache.org/jira/browse/HIVE-28945
> Project: Hive
> Issue Type: Bug
> Components: HiveServer2
> Affects Versions: 4.0.1
> Reporter: Pravin
> Priority: Major
>
> We encountered an inconsistent issue while performing an {{INSERT OVERWRITE}}
> operation between two Hive tables with identical schemas.
> * The source table, {{{}account_data{}}}, is an *external table* containing
> *954 columns* and approximately {*}10,000 rows{*}.
> * A target table, {{{}account_data_temp{}}}, was created using the {{LIKE}}
> clause to mirror the schema of {{{}account_data{}}}.
> * {{account_data_temp}} is also an {*}external table{*}, created using the
> following statement:
> CREATE EXTERNAL TABLE account_data_temp
> LIKE account_data
> LOCATION 'hdfs://clustor1/user/account/account_data_temp';
>
> The data transfer was performed using the following {{INSERT OVERWRITE}}
> query:
>
> INSERT OVERWRITE TABLE default.account_data_temp
> SELECT * FROM default.account_data;
>
> After executing the above query, we observed that few *rows were missing* in
> the target table ({{{}account_data_temp{}}}). A similar issue was noticed
> when inserting data from an *internal table to an external table* as well.
>
> *Key Observations:*
> * This issue is *not consistently reproducible* — it occurs intermittently.
> * The row count mismatch suggests *possible silent data loss* during the
> {{INSERT OVERWRITE}} operation.
> * No errors or warnings were reported during query execution.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)