[ 
https://issues.apache.org/jira/browse/IMPALA-13484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17892721#comment-17892721
 ] 

Gabor Kaszab commented on IMPALA-13484:
---------------------------------------

hey [~skyyws] ,
I'm pinging you here because I found a dataloss/snapshot loss scenario with one 
of the very [initial Iceberg 
commits|https://issues.apache.org/jira/browse/IMPALA-13484] that you wrote. I 
just basically try to understand the motivation of why persisting schema 
changes into HMS when Impala had to make some column type adjustment when 
loading the Iceberg table. E.g. when a table created by Spark/Hive has a 
"timestamp with local time zone" column Impala would alter that column to 
"timestamp" with the first read. I'm wondering if not doing this alter table 
operation and leave the metadata table as is would break any workloads. I'm 
leaning towards not doing that alter table ATM, but trying to understand if 
that breaks anythin.

> Querying an Iceberg table with TIMESTAMP_LTZ can cause data loss
> ----------------------------------------------------------------
>
>                 Key: IMPALA-13484
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13484
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>            Reporter: Gabor Kaszab
>            Priority: Major
>              Labels: impala-iceberg
>
> *+Repro steps:+*
> 1. Create a table with Hive that has a TS_LTZ column:
> {code:java}
> create table ice_hive_tbl (i int, ts_ltz timestamp with local time zone) 
> stored by iceberg;
> {code}
> 2. Insert some data using Hive:
> {code:java}
> insert into ice_hive_tbl values (1, current_timestamp());
> {code}
> 3. Add a breakpoint in Impala to the table loading code right before Impala 
> sends out an alter_table to HMS to change the column type from TS_LTZ to TS. 
> [Here|https://github.com/apache/impala/blob/c83e5d97693fd3035b33622512d1584a5e56ce8b/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java#L463],
>  for instance.
> 4. Query the table from Impala. This triggers a table load. Impala will come 
> to a decision that it should change the TS_LTZ type of a column to TS. 
> However, the break point will hold it doing this at this point.
> 5. Use Hive to add additional rows into the table:
> {code:java}
> insert into ice_hive_tbl values (2, current_timestamp());
> insert into ice_hive_tbl values (3, current_timestamp());
> {code}
> 6. Release the breakpoint and let Impala finish the SELECT query started at 4)
> 7. Do another SELECT * from Hive and/or Impala and verify that the extra rows 
> added at 5) are not present in the table.
> *+Root cause:+*
> When Impala changes the TS_LTZ column to TS it does so by calling 
> alter_table() on HMS directly. It gives a Metastore Table object to HMS as 
> the desired state of the table. HMS then persists this table object.
> The problem with this:
>  - Impala doesn't use the Iceberg API to alter the table. As a result there 
> is no conflict detection performed, and it won't be detected that another 
> commits went into the table since Impala grabbed a table object from HMS.
>  - The metadata.json path is part of the table properties, and when Impala 
> calls alter_table(tbl_obj) HMS will also persist this metadata path to the 
> table, even though there were other changes that already moved the metadata 
> path forward.
>  - Essentially this will revert the changes on the table back to the state 
> when Impala loaded the table object from HMS.
>  - In a high-frequency scenario this could cause problems when Hive (or even 
> Spark) heavily writes the table and meanwhile Impala reads this table. Some 
> snapshots could be unintentionally reverted by this behavior and as a result 
> could cause data loss or any changes like deletes being reverted.
> {+}Just a note{+}, FWIW, with the current approach Impala doesn't change the 
> column types in the Iceberg metadata, it does change the column types in HMS. 
> So even with this, the Iceberg metadata would show the column type as 
> timestamptz.
> {+}Note2{+}, I described this problem using timestamp with local time zone as 
> an example but it could also be triggered by other column types not entirely 
> compatible with Impala. I haven't made my research to find out if there is 
> any other such type, though.
> {+}Note3{+}, this issue seems to be there forever. I found the code that 
> triggers this being added by one of the first changes wrt Iceberg 
> integration, the "[Create Iceberg 
> table|https://github.com/apache/impala/commit/8fcad905a12d018eb0a354f7e4793e5b0d5efd3b]";
>  change.
> *+Possible solutions:+*
> 1. Impala can do the alter table by calling the Iceberg API and not HMS 
> directly.
> There are thing to be careful about:
>  - With this approach would the above repro steps make the table loading fail 
> due to conflict between commits on the tables? Or could the schema change be 
> merged automatically be Iceberg lib to the latest state even if there had 
> been changes on the table? I think this would work as expected and won't 
> reject loading the table, but we should make sure when testing this.
>  - With this approach Impala would set the TS_LTZ cols to TS properly causing 
> no snapshots to be lost. However, when a new write is performed by 
> Hive/Spark, they'd set the col types back to TS_LTZ. And then when Impala 
> reads the table again, it will set these cols to TS again. And so on. 
> Question is, would a scenario like this flood Iceberg metadata, e.g. 
> metadata.json with all this uncontrolled schema changes?
>  - Now we talk about schema changes, but in fact what the code does now is 
> way wider than that. It sends a table object to HMS to persist it. We have to 
> double check if the current approach only persists schema changes or could do 
> any other changes too. E.g. the code also sets DO_NOT_UPDATE_STATS property 
> and the last DDL time too. Could it change anything else as well that we 
> might miss with this approach?
> 2. Do not do any alter_tables after loading the Iceberg table
> This approach would simply drop the code [after this 
> line|https://github.com/apache/impala/blob/c83e5d97693fd3035b33622512d1584a5e56ce8b/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java#L463]
>  and won't do any HMS schema changes. Impala then internally could use the 
> adjusted column types, but won't change the types of the columns in HMS. The 
> question here is if this would break any use cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to