This is an automated email from the ASF dual-hosted git repository.

asherman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
     new 06eb62d3e IMPALA-12197: Prevent assertion failures when 
isClusteringColumn() is called on a IcebergTimeTravelTable.
06eb62d3e is described below

commit 06eb62d3efa1c94810c4276f90896fa62205a49b
Author: Andrew Sherman <[email protected]>
AuthorDate: Thu Jun 8 14:27:00 2023 -0700

    IMPALA-12197: Prevent assertion failures when isClusteringColumn() is 
called on a IcebergTimeTravelTable.
    
    When using local catalog mode, if a runtime filter is being generated
    for a time travel iceberg table, then a query may fail with "ERROR:
    IllegalArgumentException: null"
    
    In the planner an Iceberg table that is being accessed with Time Travel
    is represented by an IcebergTimeTravelTable object. This object
    represents a time-based variation on a base table. The
    IcebergTimeTravelTable may represent a different schema from the base
    table, it does this by tracking its own set of Columns. As part of
    generating a runtime filter the isClusteringColumn() method is called
    on the table. IcebergTimeTravelTable was delegating this call to the
    base object. In local catalog mode this method is implemented by
    LocalTable which has a Preconditions check (an assertion) that the
    column parameter matches the stored column. In this case the check
    fails as the base table and time travel table have their own distinct
    set of column objects.
    
    The fix is to have IcebergTimeTravelTable provide its own
    isClusteringColumn() method. For iceberg there are no clustering
    columns, so this method simply returns false.
    
    TESTING
    - Ran all end-to-end tests.
    - Added test case for query that failed.
    
    Change-Id: I51d04c8757fb48bd417248492d4615ac58085632
    Reviewed-on: http://gerrit.cloudera.org:8080/20034
    Reviewed-by: Impala Public Jenkins <[email protected]>
    Tested-by: Impala Public Jenkins <[email protected]>
---
 .../org/apache/impala/catalog/IcebergTimeTravelTable.java |  9 +++++++--
 .../queries/QueryTest/iceberg-time-travel.test            | 15 +++++++++++++++
 tests/query_test/test_iceberg.py                          |  3 +++
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git 
a/fe/src/main/java/org/apache/impala/catalog/IcebergTimeTravelTable.java 
b/fe/src/main/java/org/apache/impala/catalog/IcebergTimeTravelTable.java
index 381c929b6..521ac75fa 100644
--- a/fe/src/main/java/org/apache/impala/catalog/IcebergTimeTravelTable.java
+++ b/fe/src/main/java/org/apache/impala/catalog/IcebergTimeTravelTable.java
@@ -75,8 +75,7 @@ public class IcebergTimeTravelTable
   // The Time Travel parameters that control the schema for the table.
   private final TimeTravelSpec timeTravelSpec_;
 
-  // colsByPos[i] refers to the ith column in the table. The first 
numClusteringCols are
-  // the clustering columns.
+  // colsByPos[i] refers to the ith column in the table.
   protected final ArrayList<Column> colsByPos_ = new ArrayList<>();
 
   // map from lowercase column name to Column object.
@@ -156,6 +155,12 @@ public class IcebergTimeTravelTable
     return colsByPos_;
   }
 
+  @Override
+  public boolean isClusteringColumn(Column c) {
+    Preconditions.checkArgument(colsByPos_.get(c.getPosition()) == c);
+    return false;
+  }
+
   @Override
   public TTableDescriptor toThriftDescriptor(
       int tableId, Set<Long> referencedPartitions) {
diff --git 
a/testdata/workloads/functional-query/queries/QueryTest/iceberg-time-travel.test
 
b/testdata/workloads/functional-query/queries/QueryTest/iceberg-time-travel.test
new file mode 100644
index 000000000..6518abf08
--- /dev/null
+++ 
b/testdata/workloads/functional-query/queries/QueryTest/iceberg-time-travel.test
@@ -0,0 +1,15 @@
+====
+---- QUERY
+# Time travel query that tickles bug IMPALA-12197.
+create table iceberg_flights (uniquecarrier string) partitioned by (year int) 
stored as iceberg;
+create table iceberg_airlines (code string) stored as iceberg;
+insert into iceberg_flights(uniquecarrier, year) values('ba', 1966);
+insert into iceberg_airlines(code) values('ba');
+WITH dist_flights AS
+( SELECT DISTINCT f1.uniquecarrier AS carrier FROM iceberg_flights FOR 
SYSTEM_TIME AS OF '2040-12-31 00:00:00.000' f1)
+SELECT * FROM dist_flights JOIN iceberg_airlines a ON dist_flights.carrier = 
a.code;
+---- RESULTS
+'ba','ba'
+---- TYPES
+STRING,STRING
+====
diff --git a/tests/query_test/test_iceberg.py b/tests/query_test/test_iceberg.py
index e8c2ed522..b5b501965 100644
--- a/tests/query_test/test_iceberg.py
+++ b/tests/query_test/test_iceberg.py
@@ -641,6 +641,9 @@ class TestIcebergTable(IcebergTestSuite):
       except Exception as e:
         assert "Cannot find a snapshot older than" in str(e)
 
+  def test_time_travel_queries(self, vector, unique_database):
+    self.run_test_case('QueryTest/iceberg-time-travel', vector, 
use_db=unique_database)
+
   @SkipIf.not_dfs
   def test_strings_utf8(self, vector, unique_database):
     # Create table

Reply via email to