zclllyybb commented on issue #63824: URL: https://github.com/apache/doris/issues/63824#issuecomment-4562380444
I checked the live issue metadata and the Doris 4.1.1 source matching the reported Docker BE commit `b10073ad9ca17cd5685c4dd3b3ef650f256376d0`. There are no issue comments yet, and no labels are currently attached. Initial judgment: this should be treated as a valid high-severity BE correctness and stability bug in nested high-order array lambda execution. The report is not just an `array_agg` crash: the non-crashing examples show wrong lexical binding for nested lambdas, and the `array_agg` case appears to turn the same binding defect into an out-of-range/invalid column access. Code evidence from the affected tag: - In Nereids, `array_count(lambda, ...)` is rewritten as `array_count(array_map(lambda, ...))` (`ArrayCount.java`), so the reported query reaches BE as nested `array_map` execution. - `ArrayMapFunction::execute()` collects slot refs from `children[0]`, computes a `gap`, then recursively calls `_set_column_ref_column_id(children[0], gap)`. - `_collect_slot_ref_column_id()` and `_set_column_ref_column_id()` both recurse through all children. I do not see a scope boundary check for nested `VLambdaFunctionExpr`/nested lambda bodies. - `VColumnRef::set_gap()` only writes `_gap` when the existing value is zero. If an outer `array_map` traversal sets the gap on `VColumnRef` nodes that belong to an inner lambda, the inner lambda cannot reliably rebind them later. - `VColumnRef::execute_column()` then reads `block->get_by_position(_column_id + _gap)`. In 4.1.1 the const overload of `Block::get_by_position()` is unchecked, while `safe_get_by_position()` exists separately. This makes a wrong lambda gap able to propagate into an invalid column dereference, matching the reported stack through `PreparedFunctionImpl::default_implementation_for_constant_arguments()`, `VectorizedUtils::all_arguments_are_constant()`, and `is_column_const()`. So the likely fix should be in lambda scoping/binding, not only at the crash leaf. A bounds guard in `VColumnRef` would be useful as a defensive safety net, but it would not fix the wrong-result case: ```sql SELECT array_map(x -> array_count(y -> y = x, ['a']), ['b']); ``` That query should return `[0]`; returning `[1]` indicates the inner lambda is not resolving the captured outer variable according to lexical scope. Suggested next steps: 1. Reproduce on current `master` and the relevant 4.1 maintenance branch, then decide whether this needs a 4.1 backport. 2. Make the lambda traversal in `ArrayMapFunction` scope-aware. The traversal for one lambda should not collect or mutate `VColumnRef` state under a nested lambda as if it belonged to the same lambda scope. 3. Consider removing or isolating mutable execution-specific gap state from shared `VColumnRef` nodes, or clone/rebind the lambda body per scope. 4. Add a defensive range check in `VColumnRef::execute_column()` / `execute_type()` or switch to `safe_get_by_position()` so an invalid expression binding returns a query error instead of terminating BE. 5. Add regression coverage for both cases: the `array_agg` crash query should return `[1]`, and the literal nested lambda query above should return `[0]`. Missing information that would help close the loop, but is not required to classify this as a real bug: full BE log/core stack around the crash, `EXPLAIN VERBOSE` for the minimal query, and a confirmation on x86_64 or current master/branch-4.1. Breakwater-GitHub-Analysis-Slot: slot_c4bd4cb13bca -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
