This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git
commit 892b33f35d1cab01d91d9b5eaac079ec8ba236bb Author: Daniel Becker <[email protected]> AuthorDate: Tue Jun 17 15:11:09 2025 +0200 IMPALA-14154: IllegalStateException with Iceberg table with DELETE Planning a query on an Iceberg table runs on IllegalStateException in the following case: - the table has delete files for each data file (i.e. no data file without deletes) AND - there is an anti-join on top of the Iceberg delete operation (IcebergDeleteNode or HashJoinNode). The exception is thrown by a Preconditions check in SingleNodePlanner.createJoinNode() because there is no null-matching EQ operator. This happens because the conjunct that should be there is discarded, ultimately in Analyzer.canEvalAntiJoinedConjunct(). The reason is that the tuple ids passed to that function include the delete part of the Iceberg scan, but the conjunct (correctly) only refers to the tuple of the data files. Note that this does not happen if we have two regular anti-joins on top of each other because in that case the getTblRefIds() method of the bottom anti-join returns a single TableRefId, the one corresponding to the (inline) view containing it, not the two TableRefIds corresponding to its two child nodes. The conjunct references this TupleId, so Analyzer.canEvalAntiJoinedConjunct() returns true. This is not the case with Iceberg delete operations because there is no (inline) view involved. This commit solves the issue by setting the TableRefIds of the node corresponding to the Iceberg delete operation (IcebergDeleteNode or HashJoinNode) to only the table ref that corresponds to the data files, not the delete files. Testing: - added a test in iceberg-v2-read-position-deletes.test that reproduces the issue. Change-Id: If2c03fe3da44dc0516ebdf32430416a1059d37b2 Reviewed-on: http://gerrit.cloudera.org:8080/23051 Reviewed-by: Impala Public Jenkins <[email protected]> Tested-by: Impala Public Jenkins <[email protected]> --- .../apache/impala/planner/IcebergScanPlanner.java | 8 ++++++++ .../QueryTest/iceberg-v2-read-position-deletes.test | 21 +++++++++++++++++++++ 2 files changed, 29 insertions(+) diff --git a/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java b/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java index ee1e8de5e..b7ef9b6d6 100644 --- a/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java +++ b/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java @@ -340,6 +340,14 @@ public class IcebergScanPlanner { joinNode.setId(ctx_.getNextNodeId()); joinNode.init(analyzer_); joinNode.setIsDeleteRowsJoin(); + + // The output of this node only contains the tuple corresponding to 'dataScanNode', + // not that of 'deleteScanNode'. Conjuncts above this node, e.g. in another join, will + // only reference that tuple, so we should only include the table ref of + // 'dataScanNode' here. + // See IMPALA-14154. + joinNode.setTblRefIds(dataScanNode.getTblRefIds()); + return joinNode; } diff --git a/testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test b/testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test index a0f888d3d..067eddbaf 100644 --- a/testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test +++ b/testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test @@ -575,3 +575,24 @@ bigint aggregation(SUM, NumRowGroups): 0 aggregation(SUM, NumFileMetadataRead): 1 ==== +---- QUERY +# Regression test for IMPALA-14154. +# To reproduce the bug, we need a table where there is no data file without a delete file. +# The metadata table query ensures that. +select content from functional_parquet.iceberg_v2_delete_positional.`files`; +---- RESULTS +0 +1 +---- TYPES +INT +==== +---- QUERY +# Regression test for IMPALA-14154. +select `data` +from functional_parquet.iceberg_v2_delete_positional +where `data` not in (select min(`data`) from functional_parquet.iceberg_v2_delete_positional); +---- RESULTS +'c' +---- TYPES +STRING +====
