This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 892b33f35d1cab01d91d9b5eaac079ec8ba236bb
Author: Daniel Becker <[email protected]>
AuthorDate: Tue Jun 17 15:11:09 2025 +0200

    IMPALA-14154: IllegalStateException with Iceberg table with DELETE
    
    Planning a query on an Iceberg table runs on IllegalStateException in
    the following case:
      - the table has delete files for each data file (i.e. no data file
        without deletes)
      AND
      - there is an anti-join on top of the Iceberg delete operation
        (IcebergDeleteNode or HashJoinNode).
    
    The exception is thrown by a Preconditions check in
    SingleNodePlanner.createJoinNode() because there is no null-matching EQ
    operator. This happens because the conjunct that should be there is
    discarded, ultimately in Analyzer.canEvalAntiJoinedConjunct(). The
    reason is that the tuple ids passed to that function include the delete
    part of the Iceberg scan, but the conjunct (correctly) only refers to
    the tuple of the data files.
    
    Note that this does not happen if we have two regular anti-joins on top
    of each other because in that case the getTblRefIds() method of the
    bottom anti-join returns a single TableRefId, the one corresponding to
    the (inline) view containing it, not the two TableRefIds corresponding
    to its two child nodes. The conjunct references this TupleId, so
    Analyzer.canEvalAntiJoinedConjunct() returns true. This is not the case
    with Iceberg delete operations because there is no (inline) view
    involved.
    
    This commit solves the issue by setting the TableRefIds of the node
    corresponding to the Iceberg delete operation (IcebergDeleteNode or
    HashJoinNode) to only the table ref that corresponds to the data files,
    not the delete files.
    
    Testing:
      - added a test in iceberg-v2-read-position-deletes.test that
        reproduces the issue.
    
    Change-Id: If2c03fe3da44dc0516ebdf32430416a1059d37b2
    Reviewed-on: http://gerrit.cloudera.org:8080/23051
    Reviewed-by: Impala Public Jenkins <[email protected]>
    Tested-by: Impala Public Jenkins <[email protected]>
---
 .../apache/impala/planner/IcebergScanPlanner.java   |  8 ++++++++
 .../QueryTest/iceberg-v2-read-position-deletes.test | 21 +++++++++++++++++++++
 2 files changed, 29 insertions(+)

diff --git a/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java 
b/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
index ee1e8de5e..b7ef9b6d6 100644
--- a/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
+++ b/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
@@ -340,6 +340,14 @@ public class IcebergScanPlanner {
     joinNode.setId(ctx_.getNextNodeId());
     joinNode.init(analyzer_);
     joinNode.setIsDeleteRowsJoin();
+
+    // The output of this node only contains the tuple corresponding to 
'dataScanNode',
+    // not that of 'deleteScanNode'. Conjuncts above this node, e.g. in 
another join, will
+    // only reference that tuple, so we should only include the table ref of
+    // 'dataScanNode' here.
+    // See IMPALA-14154.
+    joinNode.setTblRefIds(dataScanNode.getTblRefIds());
+
     return joinNode;
   }
 
diff --git 
a/testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test
 
b/testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test
index a0f888d3d..067eddbaf 100644
--- 
a/testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test
+++ 
b/testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test
@@ -575,3 +575,24 @@ bigint
 aggregation(SUM, NumRowGroups): 0
 aggregation(SUM, NumFileMetadataRead): 1
 ====
+---- QUERY
+# Regression test for IMPALA-14154.
+# To reproduce the bug, we need a table where there is no data file without a 
delete file.
+# The metadata table query ensures that.
+select content from functional_parquet.iceberg_v2_delete_positional.`files`;
+---- RESULTS
+0
+1
+---- TYPES
+INT
+====
+---- QUERY
+# Regression test for IMPALA-14154.
+select `data`
+from functional_parquet.iceberg_v2_delete_positional
+where `data` not in (select min(`data`) from 
functional_parquet.iceberg_v2_delete_positional);
+---- RESULTS
+'c'
+---- TYPES
+STRING
+====

Reply via email to