Gabor Kaszab created IMPALA-13484:
-------------------------------------
Summary: Querying an Iceberg table with TIMESTAMP_LTZ can cause
data loss
Key: IMPALA-13484
URL: https://issues.apache.org/jira/browse/IMPALA-13484
Project: IMPALA
Issue Type: Bug
Components: Frontend
Reporter: Gabor Kaszab
*+Repro steps:+*
1. Create a table with Hive that has a TS_LTZ column:
{code:java}
create table ice_hive_tbl (i int, ts_ltz timestamp with local time zone) stored
by iceberg;
{code}
2. Insert some data using Hive:
{code:java}
insert into ice_hive_tbl values (1, current_timestamp());
{code}
3. Add a breakpoint in Impala to the table loading code right before Impala
sends out an alter_table to HMS to change the column type from TS_LTZ to TS.
[Here|https://github.com/apache/impala/blob/c83e5d97693fd3035b33622512d1584a5e56ce8b/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java#L463],
for instance.
4. Query the table from Impala. This triggers a table load. Impala will come to
a decision that it should change the TS_LTZ type of a column to TS. However,
the break point will hold it doing this at this point.
5. Use Hive to add additional rows into the table:
{code:java}
insert into ice_hive_tbl values (2, current_timestamp());
insert into ice_hive_tbl values (3, current_timestamp());
{code}
6. Release the breakpoint and let Impala finish the SELECT query started at 4)
7. Do another SELECT * from Hive and/or Impala and verify that the extra rows
added at 5) are not present in the table.
*+Root cause:+*
When Impala changes the TS_LTZ column to TS it does so by calling alter_table()
on HMS directly. It gives a Metastore Table object to HMS as the desired state
of the table. HMS then persists this table object.
The problem with this:
- Impala doesn't use the Iceberg API to alter the table. As a result there is
no conflict detection performed, and it won't be detected that another commits
went into the table since Impala grabbed a table object from HMS.
- The metadata.json path is part of the table properties, and when Impala
calls alter_table(tbl_obj) HMS will also persist this metadata path to the
table, even though there were other changes that already moved the metadata
path forward.
- Essentially this will revert the changes on the table back to the state when
Impala loaded the table object from HMS.
- In a high-frequency scenario this could cause problems when Hive (or even
Spark) heavily writes the table and meanwhile Impala reads this table. Some
snapshots could be unintentionally reverted by this behavior and as a result
could cause data loss or any changes like deletes being reverted.
{+}Just a note{+}, FWIW, with the current approach Impala doesn't change the
column types in the Iceberg metadata, it does change the column types in HMS.
So even with this, the Iceberg metadata would show the column type as
timestamptz.
{+}Note2{+}, I described this problem using timestamp with local time zone as
an example but it could also be triggered by other column types not entirely
compatible with Impala. I haven't made my research to find out if there is any
other such type, though.
{+}Note3{+}, this issue seems to be there forever. I found the code that
triggers this being added by one of the first changes wrt Iceberg integration,
the "[Create Iceberg
table|https://github.com/apache/impala/commit/8fcad905a12d018eb0a354f7e4793e5b0d5efd3b]"
change.
*+Possible solutions:+*
1. Impala can do the alter table by calling the Iceberg API and not HMS
directly.
There are thing to be careful about:
- With this approach would the above repro steps make the table loading fail
due to conflict between commits on the tables? Or could the schema change be
merged automatically be Iceberg lib to the latest state even if there had been
changes on the table? I think this would work as expected and won't reject
loading the table, but we should make sure when testing this.
- With this approach Impala would set the TS_LTZ cols to TS properly causing
no snapshots to be lost. However, when a new write is performed by Hive/Spark,
they'd set the col types back to TS_LTZ. And then when Impala reads the table
again, it will set these cols to TS again. And so on. Question is, would a
scenario like this flood Iceberg metadata, e.g. metadata.json with all this
uncontrolled schema changes?
- Now we talk about schema changes, but in fact what the code does now is way
wider than that. It sends a table object to HMS to persist it. We have to
double check if the current approach only persists schema changes or could do
any other changes too. E.g. the code also sets DO_NOT_UPDATE_STATS property and
the last DDL time too. Could it change anything else as well that we might miss
with this approach?
2. Do not do any alter_tables after loading the Iceberg table
This approach would simply drop the code [after this
line|https://github.com/apache/impala/blob/c83e5d97693fd3035b33622512d1584a5e56ce8b/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java#L463]
and won't do any HMS schema changes. Impala then internally could use the
adjusted column types, but won't change the types of the columns in HMS. The
question here is if this would break any use cases.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]