github-actions[bot] commented on code in PR #64624:
URL: https://github.com/apache/doris/pull/64624#discussion_r3461985590


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/agg/Percentile.java:
##########
@@ -78,10 +78,16 @@ private Percentile(NullableAggregateFunctionParams 
functionParams) {
 
     @Override
     public void checkLegalityBeforeTypeCoercion() {
-        if (!getArgument(1).isConstant()) {
+        Expression quantile = getArgument(1);
+        if (!quantile.isConstant()) {
             throw new AnalysisException(
                     "percentile requires second parameter must be a constant : 
" + this.toSql());
         }
+        double value = 
((org.apache.doris.nereids.trees.expressions.literal.Literal) 
quantile).getDouble();

Review Comment:
   This range check is running before constant folding/type coercion, but it 
treats every accepted constant as a numeric `Literal`. `isConstant()` is 
broader than `Literal`, so `percentile(k, 0.1 + 0.2)` passes the guard and then 
this cast throws `ClassCastException` during analysis. The same pre-coercion 
read also breaks explicitly castable string literals: these functions use 
`ExplicitlyCastableSignature`, so a call like `percentile(k, '0.5')` can be 
matched by casting the second argument to `DOUBLE`, but 
`StringLikeLiteral.getDouble()` is not numeric parsing; it byte-encodes the 
string for ordering and makes `'0.5'` look far outside `[0, 1]` before coercion 
runs. The same pattern was added to `PercentileApprox` and 
`PercentileApproxWeighted`. Please validate after folding/coercion, or 
explicitly reject unsupported literal types with a controlled compatibility 
decision.



##########
be/src/exprs/aggregate/aggregate_function_percentile.h:
##########
@@ -156,6 +158,27 @@ class AggregateFunctionPercentileApproxBase
 
     String get_name() const override { return "percentile_approx"; }
 
+    const std::vector<size_t>& get_const_argument_indexes() const override {
+        static const std::vector<size_t> two_params {1};
+        static const std::vector<size_t> three_params {1, 2};
+        return this->argument_types.size() == 3 ? three_params : two_params;
+    }
+
+    Status set_const_arguments(const std::vector<ColumnWithTypeAndName>& 
arguments) override {
+        _quantile = assert_cast<const ColumnFloat64&, 
TypeCheckOnRelease::DISABLE>(
+                            assert_cast<const 
ColumnConst&>(*arguments[0].column).get_data_column())
+                            .get_data()[0];
+        _compression =
+                this->argument_types.size() == 3

Review Comment:
   The weighted variants cannot share this arity-based setter as-is. 
`AggregateFunctionPercentileApproxWeightedThreeParams::get_const_argument_indexes()`
 returns only `{2}`, so this method receives a single captured constant, but 
because `argument_types.size() == 3` it still reads `arguments[1]` for 
compression. That is out of range for the three-argument weighted form. 
Conversely, the four-argument weighted form returns `{2, 3}`, but because its 
arity is 4 this code takes the `: 10000` branch and ignores the user-supplied 
compression. Please make the setter use the captured argument count or override 
it for the weighted subclasses so `percentile_approx_weighted(v, w, q)` and 
`percentile_approx_weighted(v, w, q, compression)` both initialize correctly.



##########
be/src/exprs/aggregate/aggregate_function_ai_agg.h:
##########
@@ -274,18 +278,32 @@ class AggregateFunctionAIAgg final
 
     bool is_blockable() const override { return true; }
 
+    const std::vector<size_t>& get_const_argument_indexes() const override {
+        static const std::vector<size_t> indexes {0, 2};
+        return indexes;
+    }
+
+    Status set_const_arguments(const std::vector<ColumnWithTypeAndName>& 
arguments) override {
+        const auto& resource_name_column =
+                assert_cast<const 
ColumnConst&>(*arguments[0].column).get_data_column();
+        _resource_name =

Review Comment:
   These `StringRef`s point into the temporary columns built by 
`AggFnEvaluator::_init_const_arguments()`. That helper stores each 
`execute_const_expr()` result in a local `std::vector<ColumnWithTypeAndName>` 
and destroys it immediately after `set_const_arguments()` returns, while 
`ColumnString::get_data_at()` returns a pointer into the column's `chars` 
buffer. After `open()`, `_resource_name` and `_task` are dangling, so the first 
`add()` can read freed memory when it calls `prepare(_resource_name, _task)`. 
Please copy these constants into owned `std::string` storage on the aggregate 
function (or into the state) before keeping them past the setter call.



##########
be/src/exprs/aggregate/aggregate_function_percentile.h:
##########
@@ -693,25 +724,32 @@ class AggregateFunctionPercentileArray final
         return 
std::make_shared<DataTypeArray>(make_nullable(std::make_shared<DataTypeFloat64>()));
     }
 
+    const std::vector<size_t>& get_const_argument_indexes() const override {
+        static const std::vector<size_t> indexes {1};
+        return indexes;
+    }
+
+    Status set_const_arguments(const std::vector<ColumnWithTypeAndName>& 
arguments) override {
+        const auto& quantile_array = assert_cast<const ColumnArray&, 
TypeCheckOnRelease::DISABLE>(
+                assert_cast<const 
ColumnConst&>(*arguments[0].column).get_data_column());
+        const auto& offset_column_data = quantile_array.get_offsets();
+        const auto& quantile_data =

Review Comment:
   This loses the null-element check for constant arrays that are not 
`ArrayLiteral`, for example `percentile_array(x, array(0.5, NULL))`. The FE 
guard only iterates `ArrayLiteral`, while the scalar `array(...)` function is 
still a constant expression. On the BE side this setter unwraps the array's 
nullable nested column and copies only `ColumnFloat64` values, ignoring the 
nested null map; null elements therefore use their default nested payload, 
commonly `0.0`, and pass `check_quantile()`. The old row path explicitly threw 
when `null_maps[i]` was set. Please keep the null-map validation when 
extracting the cached quantile array.



##########
be/src/exprs/vectorized_agg_fn.cpp:
##########
@@ -349,14 +379,26 @@ Status AggFnEvaluator::_calc_argument_columns(Block* 
block) {
     SCOPED_TIMER(_expr_timer);
     _agg_columns.resize(_input_exprs_ctxs.size());
     std::vector<int> column_ids(_input_exprs_ctxs.size());
+    std::vector<int> materialize_column_ids;
+    materialize_column_ids.reserve(_input_exprs_ctxs.size());
     for (int i = 0; i < _input_exprs_ctxs.size(); ++i) {
+        if (!_const_argument_idx.empty() && _const_argument_idx[i]) {
+            continue;
+        }
         int column_id = -1;
         RETURN_IF_ERROR(_input_exprs_ctxs[i]->execute(block, &column_id));
         column_ids[i] = column_id;
+        materialize_column_ids.push_back(column_id);
+    }
+    if (!materialize_column_ids.empty()) {
+        materialize_block_inplace(*block, materialize_column_ids.data(),
+                                  materialize_column_ids.data() + 
materialize_column_ids.size());
     }
-    materialize_block_inplace(*block, column_ids.data(),
-                              column_ids.data() + _input_exprs_ctxs.size());
     for (int i = 0; i < _input_exprs_ctxs.size(); ++i) {

Review Comment:
   Dropping the column for required constants loses the nullable-wrapper path 
for constant NULLs. Existing tests still call `percentile(level + 0.1, NULL)` 
and `percentile_approx(level, NULL, 2048)`. Those aggregate functions are 
wrapped by the nullable aggregate wrapper, whose variadic `add()` expects 
`columns[i]` to be a `ColumnNullable` whenever `is_nullable[i]` is true and 
dereferences it before deciding to skip the row. With this change 
`_agg_columns[i]` is `nullptr` for the constant argument, and the new 
percentile setters also assume `ColumnConst(ColumnFloat64)` rather than 
`ColumnConst(ColumnNullable(...))`. Please preserve nullable const columns for 
the wrapper or explicitly encode constant-null semantics in 
`set_const_arguments()` so these supported NULL cases still produce NULL 
instead of crashing or using a default quantile.



##########
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/agg/PercentileApprox.java:
##########
@@ -85,15 +85,28 @@ private PercentileApprox(NullableAggregateFunctionParams 
functionParams) {
 
     @Override
     public void checkLegalityBeforeTypeCoercion() {
-        if (!getArgument(1).isConstant()) {
+        Expression quantile = getArgument(1);
+        if (!quantile.isConstant()) {
             throw new AnalysisException(
                     "percentile_approx requires second parameter must be a 
constant : " + this.toSql());
         }
+        double value = 
((org.apache.doris.nereids.trees.expressions.literal.Literal) 
quantile).getDouble();
+        if (value < 0.0 || value > 1.0) {
+            throw new AnalysisException(
+                    "percentile_approx quantile must be in [0, 1], but got " + 
value + ": " + this.toSql());
+        }
         if (arity() == 3) {
-            if (!getArgument(2).isConstant()) {
+            Expression compressionArg = getArgument(2);
+            if (!compressionArg.isConstant()) {
                 throw new AnalysisException(
                         "percentile_approx requires the third parameter must 
be a constant : " + this.toSql());
             }
+            double compression =
+                    
((org.apache.doris.nereids.trees.expressions.literal.Literal) 
compressionArg).getDouble();
+            if (compression < 2048.0 || compression > 10000.0) {

Review Comment:
   This changes existing `percentile_approx` semantics. The BE state still 
documents and implements that when compression is not in `[2048, 10000]`, it 
falls back to the default `10000`; this PR now rejects those calls in FE before 
BE can apply that behavior. That makes previously accepted queries like 
`percentile_approx(x, 0.5, 10001)` fail analysis, and the weighted variant has 
the same new check. If the intent is only to cache const arguments, please keep 
the BE fallback semantics instead of turning out-of-range compression into an 
analysis error.



##########
be/src/exprs/vectorized_agg_fn.cpp:
##########
@@ -274,7 +275,36 @@ Status AggFnEvaluator::prepare(RuntimeState* state, const 
RowDescriptor& desc,
 }
 
 Status AggFnEvaluator::open(RuntimeState* state) {
-    return VExpr::open(_input_exprs_ctxs, state);
+    RETURN_IF_ERROR(VExpr::open(_input_exprs_ctxs, state));
+    return _init_const_arguments();
+}
+
+Status AggFnEvaluator::_init_const_arguments() {
+    const auto& const_argument_indexes = 
_function->get_const_argument_indexes();
+    if (const_argument_indexes.empty()) {
+        return Status::OK();
+    }
+
+    _const_argument_idx.assign(_input_exprs_ctxs.size(), false);
+    std::vector<ColumnWithTypeAndName> arguments;
+    arguments.reserve(const_argument_indexes.size());
+    for (const auto i : const_argument_indexes) {
+        if (i >= _input_exprs_ctxs.size()) [[unlikely]] {

Review Comment:
   This validation also runs on merge evaluators. In Nereids, 
`BUFFER_TO_RESULT`/`BUFFER_TO_BUFFER` aggregate expressions use the 
aggregate-buffer child as the current phase argument and set `is_merge_agg`; 
the BE side therefore opens a merge evaluator with only the serialized state 
slot. `_is_merge` is populated from thrift, but `_init_const_arguments()` still 
asks the function for original input const indexes and rejects index `1` for 
functions such as `percentile`. That makes the global merge phase fail in 
`open()` before it can deserialize/merge the partial state. Please skip 
const-argument extraction for `_is_merge` evaluators, or make the contract 
phase-aware so only input phases require the original const children.



##########
be/src/exprs/vectorized_agg_fn.cpp:
##########
@@ -274,7 +275,36 @@ Status AggFnEvaluator::prepare(RuntimeState* state, const 
RowDescriptor& desc,
 }
 
 Status AggFnEvaluator::open(RuntimeState* state) {
-    return VExpr::open(_input_exprs_ctxs, state);
+    RETURN_IF_ERROR(VExpr::open(_input_exprs_ctxs, state));
+    return _init_const_arguments();
+}
+
+Status AggFnEvaluator::_init_const_arguments() {
+    const auto& const_argument_indexes = 
_function->get_const_argument_indexes();
+    if (const_argument_indexes.empty()) {
+        return Status::OK();
+    }
+
+    _const_argument_idx.assign(_input_exprs_ctxs.size(), false);
+    std::vector<ColumnWithTypeAndName> arguments;
+    arguments.reserve(const_argument_indexes.size());
+    for (const auto i : const_argument_indexes) {
+        if (i >= _input_exprs_ctxs.size()) [[unlikely]] {
+            return Status::InternalError("Aggregate function {} requires 
invalid const argument {}",
+                                         _function->get_name(), i);
+        }
+        _const_argument_idx[i] = true;
+        if (!_input_exprs_ctxs[i]->root()->is_constant()) {
+            return Status::InternalError(
+                    "Aggregate function {} requires argument {} to be 
constant",
+                    _function->get_name(), i);
+        }
+
+        ColumnWithTypeAndName argument;
+        RETURN_IF_ERROR(_input_exprs_ctxs[i]->execute_const_expr(argument));
+        arguments.emplace_back(std::move(argument));
+    }
+    return _function->set_const_arguments(arguments);

Review Comment:
   `set_const_arguments()` is mutating the aggregate function object, but 
evaluator clones share that object. `AggFnEvaluator::clone()` copies 
`_function` as the same `shared_ptr`, and bucketed aggregation creates 
per-instance evaluator clones specifically because evaluator state is mutated; 
those local states can be initialized/opened concurrently. With this change 
each clone calls `open()` and writes cached constants such as `_quantile`, 
`_compression`, `_half_decay`, `_level`, or the `ai_agg` string refs into the 
same shared function object. Even when every clone writes the same value, that 
is an unsynchronized data race and it also means one evaluator's cached state 
is not actually evaluator-local. Please make the cached constants immutable 
before cloning, deep-clone the aggregate function, or keep per-evaluator owned 
const-argument storage.



##########
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/agg/PercentileArray.java:
##########
@@ -82,10 +83,25 @@ private PercentileArray(AggregateFunctionParams 
functionParams) {
 
     @Override
     public void checkLegalityBeforeTypeCoercion() {
-        if (!getArgument(1).isConstant()) {
+        Expression quantiles = getArgument(1);
+        if (!quantiles.isConstant()) {
             throw new AnalysisException(
                     "percentile_array requires second parameter must be a 
constant : " + this.toSql());
         }
+        if (quantiles instanceof ArrayLiteral) {
+            ArrayLiteral arrayLiteral = (ArrayLiteral) quantiles;
+            for (org.apache.doris.nereids.trees.expressions.literal.Literal 
item : arrayLiteral.getValue()) {
+                if (item instanceof NullLiteral) {
+                    throw new AnalysisException(
+                            "percentile_array quantile should not be null : " 
+ this.toSql());
+                }
+                double value = item.getDouble();

Review Comment:
   This has the same pre-coercion `getDouble()` problem as the scalar 
percentile checks, but on the `ARRAY<DOUBLE>` argument. `PercentileArray` 
implements `ExplicitlyCastableSignature`, so an array literal such as `['0.5', 
'0.8']` can be matched by casting its elements to `DOUBLE`. This loop runs 
before that coercion and calls `getDouble()` on the original 
`StringLikeLiteral` elements; that method byte-encodes the string for ordering 
instead of parsing it numerically, so valid numeric strings can be rejected as 
out of range. Please validate the array values after element coercion/folding, 
or explicitly reject non-numeric element literal types with a controlled 
compatibility decision.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to