This is an automated email from the ASF dual-hosted git repository.

michaelsmith pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit c53681a40d115aafa3d720a2bcef6b2dac492cad
Author: Gabor Kaszab <[email protected]>
AuthorDate: Mon Oct 28 14:42:35 2024 +0100

    IMPALA-13484: Don't call alter_table() on HMS when loading Iceberg table
    
    When Impala loads an Iceberg table it also loads the metastore
    representation from HMS. Additionally, when Impala loads the Iceberg
    metadata of the table it construct another metastore representation of
    that metadata. If these two metastore representations don't match,
    Impala calls alter_table() on HMS to persist the differences.
    
    In practice this behaviour is triggered for instance when one engine
    creates a table with column types that Impala can't read and instead it
    does an adjustment on the column type. E.g. if an engine creates a
    'timestamp with local time zone' column, Impala will change the column
    type in the HMS representation to 'timestamp' type.
    
    There are some issues with this approach:
    1: Impala calls HMS.alter_table() directly and doesn't change the table
       through the Iceberg API. As a result no conflict checks are
       performed. Since the metadata location is a simple table property
       this alter_table() call can overwrite the metadata pointer in case
       there had been other changes to the table since Impala started to
       load it. This can result in data correctness issues.
    2: Even in use cases where Impala only reads a table, it can still
       perform table modifications that is a very weird behaviour.
    3: With this approach Impala changes the table schema in HMS but doesn't
       change the Iceberg schema in the Iceberg metadata.
    
    As a solution we can simply get rid of the logic that makes the
    alter_table() call to HMS at the end of loading an Iceberg table. This
    can avoid a lot of confusions and in fact persisiting the schema
    adjustments Impala had done during table loading is not necessary.
    
    Testing:
    Ran the whole exhaustive test suite.
    
    Change-Id: Icd5d1eee421f22d3853833dc571b14d4e9005ab3
    Reviewed-on: http://gerrit.cloudera.org:8080/21993
    Reviewed-by: Impala Public Jenkins <[email protected]>
    Tested-by: Impala Public Jenkins <[email protected]>
---
 .../java/org/apache/impala/catalog/IcebergTable.java     | 16 ----------------
 1 file changed, 16 deletions(-)

diff --git a/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java 
b/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
index 1bdc132e2..768d3870b 100644
--- a/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
+++ b/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
@@ -456,22 +456,6 @@ public class IcebergTable extends Table implements 
FeIcebergTable {
       }
 
       refreshLastUsedTime();
-
-      // Let's reset the storage descriptors, so we don't update the table 
unnecessarily.
-      FeIcebergTable.resetIcebergStorageDescriptor(msTable_, msTbl);
-      // Avoid updating HMS if the schema didn't change.
-      if (msTable_.equals(msTbl)) return;
-
-      // Update the table schema in HMS.
-      try {
-        updateTimestampProperty(msTable_, TBL_PROP_LAST_DDL_TIME);
-        msTable_.putToParameters(StatsSetupConst.DO_NOT_UPDATE_STATS,
-            StatsSetupConst.TRUE);
-        msClient.alter_table(msTable_.getDbName(), msTable_.getTableName(), 
msTable_);
-        catalogTimeline.markEvent("Updated table schema in Metastore");
-      } catch (TException e) {
-        throw new TableLoadingException(e.getMessage());
-      }
     } finally {
       context.stop();
     }

Reply via email to