Gabor Kaszab created IMPALA-13484:
-------------------------------------

             Summary: Querying an Iceberg table with TIMESTAMP_LTZ can cause 
data loss
                 Key: IMPALA-13484
                 URL: https://issues.apache.org/jira/browse/IMPALA-13484
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
            Reporter: Gabor Kaszab


*+Repro steps:+*

1. Create a table with Hive that has a TS_LTZ column:
{code:java}
create table ice_hive_tbl (i int, ts_ltz timestamp with local time zone) stored 
by iceberg;
{code}
2. Insert some data using Hive:
{code:java}
insert into ice_hive_tbl values (1, current_timestamp());
{code}
3. Add a breakpoint in Impala to the table loading code right before Impala 
sends out an alter_table to HMS to change the column type from TS_LTZ to TS. 
[Here|https://github.com/apache/impala/blob/c83e5d97693fd3035b33622512d1584a5e56ce8b/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java#L463],
 for instance.
4. Query the table from Impala. This triggers a table load. Impala will come to 
a decision that it should change the TS_LTZ type of a column to TS. However, 
the break point will hold it doing this at this point.
5. Use Hive to add additional rows into the table:
{code:java}
insert into ice_hive_tbl values (2, current_timestamp());
insert into ice_hive_tbl values (3, current_timestamp());
{code}
6. Release the breakpoint and let Impala finish the SELECT query started at 4)
7. Do another SELECT * from Hive and/or Impala and verify that the extra rows 
added at 5) are not present in the table.

*+Root cause:+*

When Impala changes the TS_LTZ column to TS it does so by calling alter_table() 
on HMS directly. It gives a Metastore Table object to HMS as the desired state 
of the table. HMS then persists this table object.

The problem with this:
 - Impala doesn't use the Iceberg API to alter the table. As a result there is 
no conflict detection performed, and it won't be detected that another commits 
went into the table since Impala grabbed a table object from HMS.
 - The metadata.json path is part of the table properties, and when Impala 
calls alter_table(tbl_obj) HMS will also persist this metadata path to the 
table, even though there were other changes that already moved the metadata 
path forward.
 - Essentially this will revert the changes on the table back to the state when 
Impala loaded the table object from HMS.
 - In a high-frequency scenario this could cause problems when Hive (or even 
Spark) heavily writes the table and meanwhile Impala reads this table. Some 
snapshots could be unintentionally reverted by this behavior and as a result 
could cause data loss or any changes like deletes being reverted.

{+}Just a note{+}, FWIW, with the current approach Impala doesn't change the 
column types in the Iceberg metadata, it does change the column types in HMS. 
So even with this, the Iceberg metadata would show the column type as 
timestamptz.
{+}Note2{+}, I described this problem using timestamp with local time zone as 
an example but it could also be triggered by other column types not entirely 
compatible with Impala. I haven't made my research to find out if there is any 
other such type, though.
{+}Note3{+}, this issue seems to be there forever. I found the code that 
triggers this being added by one of the first changes wrt Iceberg integration, 
the "[Create Iceberg 
table|https://github.com/apache/impala/commit/8fcad905a12d018eb0a354f7e4793e5b0d5efd3b]";
 change.

*+Possible solutions:+*
1. Impala can do the alter table by calling the Iceberg API and not HMS 
directly.
There are thing to be careful about:
 - With this approach would the above repro steps make the table loading fail 
due to conflict between commits on the tables? Or could the schema change be 
merged automatically be Iceberg lib to the latest state even if there had been 
changes on the table? I think this would work as expected and won't reject 
loading the table, but we should make sure when testing this.
 - With this approach Impala would set the TS_LTZ cols to TS properly causing 
no snapshots to be lost. However, when a new write is performed by Hive/Spark, 
they'd set the col types back to TS_LTZ. And then when Impala reads the table 
again, it will set these cols to TS again. And so on. Question is, would a 
scenario like this flood Iceberg metadata, e.g. metadata.json with all this 
uncontrolled schema changes?
 - Now we talk about schema changes, but in fact what the code does now is way 
wider than that. It sends a table object to HMS to persist it. We have to 
double check if the current approach only persists schema changes or could do 
any other changes too. E.g. the code also sets DO_NOT_UPDATE_STATS property and 
the last DDL time too. Could it change anything else as well that we might miss 
with this approach?

2. Do not do any alter_tables after loading the Iceberg table
This approach would simply drop the code [after this 
line|https://github.com/apache/impala/blob/c83e5d97693fd3035b33622512d1584a5e56ce8b/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java#L463]
 and won't do any HMS schema changes. Impala then internally could use the 
adjusted column types, but won't change the types of the columns in HMS. The 
question here is if this would break any use cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to