This is an automated email from the ASF dual-hosted git repository.

englefly pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/master by this push:
     new a0907700fcb [fix](NestedColumnPruning): visit key/value/entry args in 
visitMapContainsKey/Value/Entry (#64440)
a0907700fcb is described below

commit a0907700fcbbdcd3210036d7bbf9114b99c39a84
Author: minghong <[email protected]>
AuthorDate: Fri Jun 19 11:49:27 2026 +0800

    [fix](NestedColumnPruning): visit key/value/entry args in 
visitMapContainsKey/Value/Entry (#64440)
    
    ### What problem does this PR solve?
    
    **Symptom**: with nested column pruning on
    (`enable_prune_nested_columns=true`, the default),
    `map_contains_key/value/entry` can return **wrong results** when its
    search argument references a nested sub-column that is *also* used in an
    `IS NULL` check in the same query.
    
    Repro:
    ```sql
    CREATE TABLE t (id INT, s STRUCT<a:STRING,b:INT>, m MAP<STRING,INT>)
    DUPLICATE KEY(id) DISTRIBUTED BY HASH(id) BUCKETS 1
    PROPERTIES ("replication_allocation" = "tag.location.default: 1");
    INSERT INTO t VALUES (1, named_struct('a','hello','b',100), {'hello':1});
    
    SELECT element_at(s,'a') IS NULL,
           map_contains_key(m, element_at(s,'a'))
    FROM t;
    -- expected:       false, true
    -- before this PR: false, false      <-- map_contains_key is wrong
    ```
    
    **Root cause**:
    `AccessPathExpressionCollector.visitMapContainsKey/Value/Entry` only
    recursed into the map argument (`getArgument(0)`) and never visited the
    key/value search argument. So the *data* access path for the search
    sub-column (`s.a`) was never registered. The only path registered for
    `s.a` came from `element_at(s,'a') IS NULL`, which is the metadata-only
    path `[s, a, NULL]`. With that as the sole path, `NestedColumnPruning`
    pruned `s.a` to null-flag-only — at runtime the real value of `s.a` is
    not read from disk, so `element_at(s,'a')` no longer yields `'hello'`
    and `map_contains_key` matches against a wrong key, returning the wrong
    result.
    
    ### How
    
    1. In `visitMapContainsKey/Value/Entry`, after visiting the map
    argument, also visit the key/value/entry search argument(s) with a
    **fresh `CollectorContext`**, so their data access paths (e.g. `[s, a]`)
    get registered and override the null-only path.
    
    2. When the whole `map_contains_*(...)` is itself wrapped in `IS NULL`
    (the incoming path is exactly `[NULL]`), route the NULL suffix to the
    **map** argument (producing `[m, NULL]`) and skip the search args. This
    is safe because all three functions declare `nullable() ==
    child(0).nullable()` — the result is NULL iff the map is NULL (on BE the
    result null flag is set solely from the map's null map, and a NULL
    search key is treated as "look for a NULL element", never as
    SQL-unknown). It mirrors the existing `visitMapKeys`/`visitMapValues`
    handling and avoids emitting a malformed `[m, KEYS, NULL]` path.
    
    Issue Number: close #xxx
    
    ### Release note
    
    Fix wrong results of
    `map_contains_key`/`map_contains_value`/`map_contains_entry` when the
    search argument is a nested sub-column that is also used in an `IS NULL`
    check, with nested column pruning enabled.
    
    ### Check List (For Author)
    
    - Test
    - [x] Regression test:
    
`regression-test/suites/nereids_rules_p0/column_pruning/map_contains_arg_pruning.groovy`
        - [ ] Unit Test
        - [ ] Manual test (add detailed scripts or steps below)
        - [ ] No need to test or manual test. Explain why:
    - [ ] This is a refactor/code format and no logic has been changed.
            - [ ] Previous test can cover this change.
            - [ ] No code files have been changed.
            - [ ] Other reason
    
    - Behavior changed:
        - [ ] No.
    - [x] Yes. Fixes wrong results of `map_contains_key/value/entry`; this
    is a data-correctness fix and should be back-picked to active release
    branches.
    
    - Does this need documentation?
        - [x] No.
        - [ ] Yes.
    
    ### Check List (For Reviewer who merge this PR)
    
    - [ ] Confirm the release note
    - [ ] Confirm test cases
    - [ ] Confirm document
    - [ ] Add branch pick label
    
    ---------
    
    Co-authored-by: Claude Opus 4.8 <[email protected]>
---
 .../rewrite/AccessPathExpressionCollector.java     |  70 ++++++-
 .../column_pruning/map_contains_arg_pruning.out    |  55 ++++++
 .../column_pruning/map_contains_arg_pruning.groovy | 215 +++++++++++++++++++++
 3 files changed, 333 insertions(+), 7 deletions(-)

diff --git 
a/fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/AccessPathExpressionCollector.java
 
b/fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/AccessPathExpressionCollector.java
index e7dbc5ca05d..75dce5e3618 100644
--- 
a/fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/AccessPathExpressionCollector.java
+++ 
b/fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/AccessPathExpressionCollector.java
@@ -341,7 +341,7 @@ public class AccessPathExpressionCollector extends 
DefaultExpressionVisitor<Void
     @Override
     public Void visitMapKeys(MapKeys mapKeys, CollectorContext context) {
         LinkedList<String> suffixPath = context.accessPathBuilder.accessPath;
-        if (isFunctionNullCheckPath(suffixPath)) {
+        if (isUnderIsNull(suffixPath)) {
             // map_keys(nullable_map) returns a NULL array only when the 
parent map is NULL.
             // The NULL suffix therefore belongs to the map itself, not to the 
KEYS child.
             return continueCollectAccessPath(mapKeys.getArgument(0), context);
@@ -360,7 +360,7 @@ public class AccessPathExpressionCollector extends 
DefaultExpressionVisitor<Void
     @Override
     public Void visitMapValues(MapValues mapValues, CollectorContext context) {
         LinkedList<String> suffixPath = context.accessPathBuilder.accessPath;
-        if (isFunctionNullCheckPath(suffixPath)) {
+        if (isUnderIsNull(suffixPath)) {
             // map_values(nullable_map) returns a NULL array only when the 
parent map is NULL.
             // A map entry whose value is NULL still produces a non-NULL 
values array.
             return continueCollectAccessPath(mapValues.getArgument(0), 
context);
@@ -376,26 +376,82 @@ public class AccessPathExpressionCollector extends 
DefaultExpressionVisitor<Void
         return continueCollectAccessPath(mapValues.getArgument(0), context);
     }
 
-    private static boolean isFunctionNullCheckPath(List<String> suffixPath) {
+    private static boolean isUnderIsNull(List<String> suffixPath) {
         return suffixPath.size() == 1 && 
AccessPathInfo.ACCESS_NULL.equals(suffixPath.get(0));
     }
 
     @Override
     public Void visitMapContainsKey(MapContainsKey mapContainsKey, 
CollectorContext context) {
+        // MAP_CONTAINS_KEY(<map>, <key>)
+        //
+        // isUnderIsNull checks whether the parent of this expression is IS 
NULL,
+        // splitting queries into two shapes:
+        //
+        // Shape A (parent is IS NULL):
+        //   SQL: SELECT ... WHERE map_contains_key(m, k) IS NULL
+        //   map_contains_key(m, k) returns NULL only when m itself is NULL — 
so the path
+        //   should be m.NULL, not m.KEYS.NULL
+        //
+        // Shape B (regular predicate):
+        //   SQL: SELECT ... WHERE map_contains_key(m, element_at(s, 'city'))
+        //                     AND element_at(s, 'city') IS NULL
+        //   We add the KEYS prefix for the map column and visit key arg: 
`element_at(s, 'city')` with a
+        //   fresh context.
+        //   s collects two paths:
+        //     [s, city]        ← from key arg (fresh context → DATA path)
+        //     [s, city, NULL]  ← from IS NULL
+        //   NestedColumnPruning sees [s, city] and strips [s, city, NULL] in 
prune phase.
+        if (isUnderIsNull(context.accessPathBuilder.accessPath)) {
+            // Shape A: skip KEYS prefix, route NULL directly to the map 
column.
+            return continueCollectAccessPath(mapContainsKey.getArgument(0), 
context);
+        }
+        // Shape B: map argument — only the key sub-column is needed.
         context.accessPathBuilder.addPrefix(AccessPathInfo.ACCESS_MAP_KEYS);
-        return continueCollectAccessPath(mapContainsKey.getArgument(0), 
context);
+        continueCollectAccessPath(mapContainsKey.getArgument(0), context);
+        // Shape B: key argument — visit with a fresh context to register 
full-data paths.
+        Expression keyArg = mapContainsKey.getArgument(1);
+        CollectorContext keyCtx = new 
CollectorContext(context.statementContext, context.bottomFilter);
+        continueCollectAccessPath(keyArg, keyCtx);
+        return null;
     }
 
     @Override
     public Void visitMapContainsValue(MapContainsValue mapContainsValue, 
CollectorContext context) {
+        // MAP_CONTAINS_VALUE(<map>, <value>)
+        // Same two-shape logic as visitMapContainsKey; see that method for 
the full rationale.
+        //
+        // Shape A (parent is IS NULL): skip VALUES prefix, route NULL to m → 
m.NULL.
+        // Shape B (regular predicate): add VALUES prefix for m, visit value 
arg with fresh context.
+        if (isUnderIsNull(context.accessPathBuilder.accessPath)) {
+            return continueCollectAccessPath(mapContainsValue.getArgument(0), 
context);
+        }
         context.accessPathBuilder.addPrefix(AccessPathInfo.ACCESS_MAP_VALUES);
-        return continueCollectAccessPath(mapContainsValue.getArgument(0), 
context);
+        continueCollectAccessPath(mapContainsValue.getArgument(0), context);
+        Expression valueArg = mapContainsValue.getArgument(1);
+        CollectorContext valueCtx = new 
CollectorContext(context.statementContext, context.bottomFilter);
+        continueCollectAccessPath(valueArg, valueCtx);
+        return null;
     }
 
     @Override
     public Void visitMapContainsEntry(MapContainsEntry mapContainsEntry, 
CollectorContext context) {
+        // MAP_CONTAINS_ENTRY(<map>, <key>, <value>)
+        // Same two-shape logic as visitMapContainsKey; see that method for 
the full rationale.
+        //
+        // Shape A (parent is IS NULL): skip sub-column prefix, route NULL to 
m → m.NULL.
+        // Shape B (regular predicate): add ACCESS_ALL for m (needs both keys 
and values),
+        //     visit key/value args with fresh context.
+        if (isUnderIsNull(context.accessPathBuilder.accessPath)) {
+            return continueCollectAccessPath(mapContainsEntry.getArgument(0), 
context);
+        }
         context.accessPathBuilder.addPrefix(AccessPathInfo.ACCESS_ALL);
-        return continueCollectAccessPath(mapContainsEntry.getArgument(0), 
context);
+        continueCollectAccessPath(mapContainsEntry.getArgument(0), context);
+        for (int i = 1; i < mapContainsEntry.arity(); i++) {
+            Expression entryArg = mapContainsEntry.getArgument(i);
+            CollectorContext entryCtx = new 
CollectorContext(context.statementContext, context.bottomFilter);
+            continueCollectAccessPath(entryArg, entryCtx);
+        }
+        return null;
     }
 
     @Override
@@ -575,7 +631,7 @@ public class AccessPathExpressionCollector extends 
DefaultExpressionVisitor<Void
 
     @Override
     public Void visitIf(If ifExpr, CollectorContext context) {
-        if (isFunctionNullCheckPath(context.accessPathBuilder.accessPath)) {
+        if (isUnderIsNull(context.accessPathBuilder.accessPath)) {
             ifExpr.getCondition().accept(this, new 
CollectorContext(context.statementContext, context.bottomFilter));
             ifExpr.getTrueValue().accept(this, copyContext(context));
             ifExpr.getFalseValue().accept(this, copyContext(context));
diff --git 
a/regression-test/data/nereids_rules_p0/column_pruning/map_contains_arg_pruning.out
 
b/regression-test/data/nereids_rules_p0/column_pruning/map_contains_arg_pruning.out
new file mode 100644
index 00000000000..498c202752f
--- /dev/null
+++ 
b/regression-test/data/nereids_rules_p0/column_pruning/map_contains_arg_pruning.out
@@ -0,0 +1,55 @@
+-- This file is automatically generated. You should know what you did if you 
want to edit this
+-- !case1 --
+1      false   true
+2      false   true
+3      true    \N
+4      true    false
+
+-- !case2 --
+1      false   false
+2      false   false
+3      false   \N
+4      true    false
+
+-- !case3 --
+1      false   false   false
+2      false   false   false
+3      true    false   \N
+4      true    true    false
+
+-- !case4 --
+1      false
+2      false
+3      true
+4      false
+
+-- !case5 --
+1      false
+2      false
+3      true
+4      false
+
+-- !case6 --
+1      false
+2      false
+3      true
+4      false
+
+-- !case7a --
+1      true
+2      true
+3      false
+4      true
+
+-- !case7b --
+1      true
+2      true
+3      false
+4      true
+
+-- !case7c --
+1      true
+2      true
+3      false
+4      true
+
diff --git 
a/regression-test/suites/nereids_rules_p0/column_pruning/map_contains_arg_pruning.groovy
 
b/regression-test/suites/nereids_rules_p0/column_pruning/map_contains_arg_pruning.groovy
new file mode 100644
index 00000000000..ea2a51778db
--- /dev/null
+++ 
b/regression-test/suites/nereids_rules_p0/column_pruning/map_contains_arg_pruning.groovy
@@ -0,0 +1,215 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+// Regression tests for map_contains_key / map_contains_value / 
map_contains_entry
+// when the key/value/entry argument references nested sub-columns.
+//
+// Bug: visitMapContainsKey/Value/Entry only visited the map argument and 
skipped
+// the key/value/entry argument. When the key is a nested sub-column expression
+// (e.g. element_at(s, 'a')) whose data path was not registered, and the same
+// sub-column also appears in IS NULL, NestedColumnPruning would prune it to
+// null-only metadata access, causing wrong results.
+
+suite("map_contains_arg_pruning") {
+    sql """ DROP TABLE IF EXISTS map_contains_arg_pruning_tbl """
+    sql """
+        CREATE TABLE map_contains_arg_pruning_tbl (
+            id  INT,
+            s   STRUCT<a: STRING, b: INT> NULL,
+            m   MAP<STRING, INT> NULL,
+            v   VARIANT NULL
+        ) ENGINE = OLAP
+        DUPLICATE KEY(id)
+        DISTRIBUTED BY HASH(id) BUCKETS 1
+        PROPERTIES ("replication_allocation" = "tag.location.default: 1")
+    """
+    sql """
+        INSERT INTO map_contains_arg_pruning_tbl VALUES
+            (1, named_struct('a', 'hello', 'b', 100),  {'hello': 1, 'world': 
2},  '{"k": 42}'),
+            (2, named_struct('a', 'doris', 'b', 200),  {'doris': 3},           
   NULL),
+            (3, named_struct('a', null, 'b', 300),      NULL,                  
    '{"x": 1}'),
+            (4, NULL,                                   {},                    
    '{}')
+    """
+
+    // ================================================================
+    // Case 1: map_contains_key + element_at IS NULL (original bug)
+    // map_contains_key(m, element_at(s, 'a')) needs full access to s.a
+    // as the key lookup value. Without fix, only [s.a.NULL] from
+    // element_at(s, 'a') IS NULL is registered.
+    // ================================================================
+    explain {
+        sql """
+            SELECT id,
+                   element_at(s, 'a') IS NULL,
+                   map_contains_key(m, element_at(s, 'a'))
+            FROM map_contains_arg_pruning_tbl ORDER BY id
+        """
+        contains "nested columns"
+        contains "s.a"                       // s.a should appear in access 
paths
+        notContains "s.a.NULL"               // should NOT be null-only
+        contains "m.KEYS"                    // map_contains_key needs KEYS 
path
+    }
+
+    order_qt_case1 """
+        SELECT id,
+               element_at(s, 'a') IS NULL,
+               map_contains_key(m, element_at(s, 'a'))
+        FROM map_contains_arg_pruning_tbl ORDER BY id
+    """
+
+    // ================================================================
+    // Case 2: map_contains_value, value arg references sub-column
+    // map_contains_value(m, element_at(s, 'b')) needs s.b as value
+    // ================================================================
+    explain {
+        sql """
+            SELECT id,
+                   element_at(s, 'b') IS NULL,
+                   map_contains_value(m, element_at(s, 'b'))
+            FROM map_contains_arg_pruning_tbl ORDER BY id
+        """
+        contains "nested columns"
+        contains "s.b"                       // s.b should appear in access 
paths
+        notContains "s.b.NULL"               // should NOT be null-only
+    }
+
+    order_qt_case2 """
+        SELECT id,
+               element_at(s, 'b') IS NULL,
+               map_contains_value(m, element_at(s, 'b'))
+        FROM map_contains_arg_pruning_tbl ORDER BY id
+    """
+
+    // ================================================================
+    // Case 3: map_contains_entry(m, key, value) — ternary function.
+    // visitMapContainsEntry must visit BOTH search arguments (arg1=key,
+    // arg2=value) with fresh contexts. Without the fix, only arg1 was
+    // collected and arg2 was silently skipped, leaving s.b unregistered
+    // so that element_at(s, 'b') IS NULL would prune s.b to null-only.
+    // ================================================================
+    explain {
+        sql """
+            SELECT id,
+                   element_at(s, 'a') IS NULL,
+                   element_at(s, 'b') IS NULL,
+                   map_contains_entry(m, element_at(s, 'a'), element_at(s, 
'b'))
+            FROM map_contains_arg_pruning_tbl ORDER BY id
+        """
+        contains "nested columns"
+        contains "s.a"
+        contains "s.b"
+        notContains "s.a.NULL"
+        notContains "s.b.NULL"
+    }
+
+    order_qt_case3 """
+        SELECT id,
+               element_at(s, 'a') IS NULL,
+               element_at(s, 'b') IS NULL,
+               map_contains_entry(m, element_at(s, 'a'), element_at(s, 'b'))
+        FROM map_contains_arg_pruning_tbl ORDER BY id
+    """
+
+    // ================================================================
+    // Case 4: map_contains_key(...) IS NULL — the key argument must NOT
+    // be collected as a full-data path because map_contains_key returns
+    // NULL only when the map itself is NULL. The search key does not
+    // affect nullability.
+    //
+    // When map_contains_key(...) IS NULL is the only expression, the
+    // NULL-only context causes the key argument to be skipped entirely,
+    // so s.a is never registered and no nested column pruning is emitted.
+    // ================================================================
+    explain {
+        sql """
+            SELECT id,
+                   map_contains_key(m, element_at(s, 'a')) IS NULL
+            FROM map_contains_arg_pruning_tbl ORDER BY id
+        """
+        // s.a must NOT appear in access paths: the key argument inside
+        // map_contains_key(...) IS NULL is skipped in NULL-only context.
+        notContains "[s.a]"
+    }
+
+    order_qt_case4 """
+        SELECT id,
+               map_contains_key(m, element_at(s, 'a')) IS NULL
+        FROM map_contains_arg_pruning_tbl ORDER BY id
+    """
+
+    // ================================================================
+    // Case 5: map_contains_value(...) IS NULL — same behaviour as
+    // map_contains_key: the value argument does not affect nullability.
+    // ================================================================
+    explain {
+        sql """
+            SELECT id,
+                   map_contains_value(m, element_at(s, 'b')) IS NULL
+            FROM map_contains_arg_pruning_tbl ORDER BY id
+        """
+        notContains "[s.b]"
+    }
+
+    order_qt_case5 """
+        SELECT id,
+               map_contains_value(m, element_at(s, 'b')) IS NULL
+        FROM map_contains_arg_pruning_tbl ORDER BY id
+    """
+
+    // ================================================================
+    // Case 6: map_contains_entry(...) IS NULL — both key and value
+    // arguments must be skipped. No nested column pruning is expected
+    // because only the map argument is visited with NULL-only path.
+    // ================================================================
+    explain {
+        sql """
+            SELECT id,
+                   map_contains_entry(m, element_at(s, 'a'), element_at(s, 
'b')) IS NULL
+            FROM map_contains_arg_pruning_tbl ORDER BY id
+        """
+        notContains "s.a"
+        notContains "s.b"
+    }
+
+    order_qt_case6 """
+        SELECT id,
+               map_contains_entry(m, element_at(s, 'a'), element_at(s, 'b')) 
IS NULL
+        FROM map_contains_arg_pruning_tbl ORDER BY id
+    """
+
+    // ================================================================
+    // Case 7: IS NOT NULL variants — NOT(IS NULL) delegates to the
+    // same visitIsNull path, so the NULL-only routing logic applies.
+    // ================================================================
+    order_qt_case7a """
+        SELECT id,
+               map_contains_key(m, element_at(s, 'a')) IS NOT NULL
+        FROM map_contains_arg_pruning_tbl ORDER BY id
+    """
+
+    order_qt_case7b """
+        SELECT id,
+               map_contains_value(m, element_at(s, 'b')) IS NOT NULL
+        FROM map_contains_arg_pruning_tbl ORDER BY id
+    """
+
+    order_qt_case7c """
+        SELECT id,
+               map_contains_entry(m, element_at(s, 'a'), element_at(s, 'b')) 
IS NOT NULL
+        FROM map_contains_arg_pruning_tbl ORDER BY id
+    """
+}


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to