This is an automated email from the ASF dual-hosted git repository. michaelsmith pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git
commit c53681a40d115aafa3d720a2bcef6b2dac492cad Author: Gabor Kaszab <[email protected]> AuthorDate: Mon Oct 28 14:42:35 2024 +0100 IMPALA-13484: Don't call alter_table() on HMS when loading Iceberg table When Impala loads an Iceberg table it also loads the metastore representation from HMS. Additionally, when Impala loads the Iceberg metadata of the table it construct another metastore representation of that metadata. If these two metastore representations don't match, Impala calls alter_table() on HMS to persist the differences. In practice this behaviour is triggered for instance when one engine creates a table with column types that Impala can't read and instead it does an adjustment on the column type. E.g. if an engine creates a 'timestamp with local time zone' column, Impala will change the column type in the HMS representation to 'timestamp' type. There are some issues with this approach: 1: Impala calls HMS.alter_table() directly and doesn't change the table through the Iceberg API. As a result no conflict checks are performed. Since the metadata location is a simple table property this alter_table() call can overwrite the metadata pointer in case there had been other changes to the table since Impala started to load it. This can result in data correctness issues. 2: Even in use cases where Impala only reads a table, it can still perform table modifications that is a very weird behaviour. 3: With this approach Impala changes the table schema in HMS but doesn't change the Iceberg schema in the Iceberg metadata. As a solution we can simply get rid of the logic that makes the alter_table() call to HMS at the end of loading an Iceberg table. This can avoid a lot of confusions and in fact persisiting the schema adjustments Impala had done during table loading is not necessary. Testing: Ran the whole exhaustive test suite. Change-Id: Icd5d1eee421f22d3853833dc571b14d4e9005ab3 Reviewed-on: http://gerrit.cloudera.org:8080/21993 Reviewed-by: Impala Public Jenkins <[email protected]> Tested-by: Impala Public Jenkins <[email protected]> --- .../java/org/apache/impala/catalog/IcebergTable.java | 16 ---------------- 1 file changed, 16 deletions(-) diff --git a/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java b/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java index 1bdc132e2..768d3870b 100644 --- a/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java +++ b/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java @@ -456,22 +456,6 @@ public class IcebergTable extends Table implements FeIcebergTable { } refreshLastUsedTime(); - - // Let's reset the storage descriptors, so we don't update the table unnecessarily. - FeIcebergTable.resetIcebergStorageDescriptor(msTable_, msTbl); - // Avoid updating HMS if the schema didn't change. - if (msTable_.equals(msTbl)) return; - - // Update the table schema in HMS. - try { - updateTimestampProperty(msTable_, TBL_PROP_LAST_DDL_TIME); - msTable_.putToParameters(StatsSetupConst.DO_NOT_UPDATE_STATS, - StatsSetupConst.TRUE); - msClient.alter_table(msTable_.getDbName(), msTable_.getTableName(), msTable_); - catalogTimeline.markEvent("Updated table schema in Metastore"); - } catch (TException e) { - throw new TableLoadingException(e.getMessage()); - } } finally { context.stop(); }
