Copilot commented on code in PR #54909:
URL: https://github.com/apache/doris/pull/54909#discussion_r2281262257
##########
.github/instructions/function-instructions.md:
##########
@@ -0,0 +1,202 @@
+---
+applyTo:
+ - "be/src/vec/functions/**/*.h"
+ - "be/src/vec/functions/**/*.hpp"
+ - "be/src/vec/functions/**/*.cc"
+ - "be/src/vec/functions/**/*.cpp"
+ -
"fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/**/*.java"
+ -
"fe/fe-core/src/main/java/org/apache/doris/catalog/BuiltinScalarFunctions.java"
+---
+
+# Apache Doris Function Instructions
+
+## Scope
+
+- BE vectorized function implementation and registration:
`be/src/vec/functions/`
+- FE function signatures/type inference/nullability/visitors:
`fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/`
+- FE constant folding: `.../functions/executable/`
+
+Goal: provide unified guidance for implementing/registering SQL functions,
nullability contracts, constant-column handling, and testing, so reviewers can
check for consistency and quality.
+
+## FE-side Specification (Signatures, Nullability, Constant Folding)
+
+- Place signature classes under:
`org.apache.doris.nereids.trees.expressions.functions.scalar.*` (or
`table`/`agg` subpackages).
+- Common nullability/folding traits (combine as needed):
+ - `AlwaysNullable`: return type is always `Nullable`, even for not-null
inputs (e.g., parse failure/out-of-range → NULL).
+ - `AlwaysNotNullable`: return type is never `Nullable` (e.g., normalize NULL
to a non-null constant).
+ - `PropagateNullable`: if any input is `Nullable`, output is `Nullable`;
otherwise not (default preference).
+ - `PropagateNullLiteral`: during constant folding, if any input is a
`NullLiteral`, output folds to a `NullLiteral` (e.g., `Acos(NULL)` → `NULL`).
+ - `PropagateNullableOnDateOrTimeLikeV2Args`: if any argument is
DateV2/DateTimeV2/Time-like, behave like `PropagateNullable`; otherwise behave
like `AlwaysNullable`.
+- Nullability tip: Prefer these traits; if custom semantics are needed, don’t
implement the three standard traits and override `nullable()` directly (see
`ComputeNullable`).
+- Register the class and aliases in `BuiltinScalarFunctions` to ensure parsing
and dispatching.
+- Constant folding (optional): For deterministic functions, consider adding
constant folding under `functions/executable/`; for NULL literal pass-through
folding, use `PropagateNullLiteral`.
+- Note: Legacy `doris_builtin_functions.py` is deprecated; do not modify.
Review Comment:
The documentation mentions a deprecated file `doris_builtin_functions.py`
but doesn't provide its location or what developers should use instead.
Consider adding the file path and pointing to the replacement mechanism
(BuiltinScalarFunctions).
```suggestion
- Note: Legacy `doris_builtin_functions.py` (located at
`fe/fe-core/scripts/doris_builtin_functions.py`) is deprecated; do not modify.
Use `BuiltinScalarFunctions`
(`fe/fe-core/src/main/java/org/apache/doris/catalog/BuiltinScalarFunctions.java`)
for function registration and management instead.
```
##########
.github/instructions/function-instructions.md:
##########
@@ -0,0 +1,202 @@
+---
+applyTo:
+ - "be/src/vec/functions/**/*.h"
+ - "be/src/vec/functions/**/*.hpp"
+ - "be/src/vec/functions/**/*.cc"
+ - "be/src/vec/functions/**/*.cpp"
+ -
"fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/**/*.java"
+ -
"fe/fe-core/src/main/java/org/apache/doris/catalog/BuiltinScalarFunctions.java"
+---
+
+# Apache Doris Function Instructions
+
+## Scope
+
+- BE vectorized function implementation and registration:
`be/src/vec/functions/`
+- FE function signatures/type inference/nullability/visitors:
`fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/`
+- FE constant folding: `.../functions/executable/`
+
+Goal: provide unified guidance for implementing/registering SQL functions,
nullability contracts, constant-column handling, and testing, so reviewers can
check for consistency and quality.
+
+## FE-side Specification (Signatures, Nullability, Constant Folding)
+
+- Place signature classes under:
`org.apache.doris.nereids.trees.expressions.functions.scalar.*` (or
`table`/`agg` subpackages).
+- Common nullability/folding traits (combine as needed):
+ - `AlwaysNullable`: return type is always `Nullable`, even for not-null
inputs (e.g., parse failure/out-of-range → NULL).
+ - `AlwaysNotNullable`: return type is never `Nullable` (e.g., normalize NULL
to a non-null constant).
+ - `PropagateNullable`: if any input is `Nullable`, output is `Nullable`;
otherwise not (default preference).
+ - `PropagateNullLiteral`: during constant folding, if any input is a
`NullLiteral`, output folds to a `NullLiteral` (e.g., `Acos(NULL)` → `NULL`).
+ - `PropagateNullableOnDateOrTimeLikeV2Args`: if any argument is
DateV2/DateTimeV2/Time-like, behave like `PropagateNullable`; otherwise behave
like `AlwaysNullable`.
+- Nullability tip: Prefer these traits; if custom semantics are needed, don’t
implement the three standard traits and override `nullable()` directly (see
`ComputeNullable`).
+- Register the class and aliases in `BuiltinScalarFunctions` to ensure parsing
and dispatching.
+- Constant folding (optional): For deterministic functions, consider adding
constant folding under `functions/executable/`; for NULL literal pass-through
folding, use `PropagateNullLiteral`.
+- Note: Legacy `doris_builtin_functions.py` is deprecated; do not modify.
+- Note: Some expressions should not blindly use generic traits (e.g.,
`InPredicate` should not use `PropagateNullable`) to avoid incorrect folding or
nullability inference.
+
+## BE-side Specification (Implementation, Performance, Registration)
+
+- Typical skeleton (inherit from `IFunction`):
+ - `static constexpr auto name` / `get_name()`
+ - `get_number_of_arguments()` or support variadic
+ - `get_return_type_impl(const DataTypes&)` (if
`use_default_implementation_for_nulls()==true`, return the bare type; the
framework wraps Nullable. Only manually `make_nullable(...)` if semantically
“always nullable” or you handle NULLs yourself.)
+ - `Status execute_impl(FunctionContext*, Block&, const ColumnNumbers&,
uint32_t, size_t)`
+- Registration steps:
+ 1) Add the implementation under `be/src/vec/functions/`.
+ 2) Register it in `register_function_xxx(SimpleFunctionFactory&)` (or create
a new one).
+ 3) In `simple_function_factory.h`, call `register_function_xxx(instance);`
inside `SimpleFunctionFactory::instance()` initialization.
+
+### New Paradigm for Constant Column (ColumnConst) Handling
+
+- Avoid blindly expanding constants via `convert_to_full_column_if_const`.
+- Use:
+ - `unpack_if_const(col)` → `(ColumnPtr, bool is_const)` to get the real
column and constancy.
+ - `default_preprocess_parameter_columns(...)` to expand “parameter columns”
(non-data columns) when needed.
+ - If some arguments must be constants, override
`get_arguments_that_are_always_constant()` for automatic validation.
+ - Usually keep `use_default_implementation_for_constants() == true`;
implement only `vector_vector`/`vector_scalar`; the framework treats
`const_const` equivalently.
+ - For non-deterministic functions (e.g., `random`), set
`use_default_implementation_for_constants()` to `false` to disable folding.
+
+Example (two arguments):
+
+```cpp
+const auto& [lcol, lconst] =
unpack_if_const(block.get_by_position(arguments[0]).column);
+const auto& [rcol, rconst] =
unpack_if_const(block.get_by_position(arguments[1]).column);
+// Choose vector_vector or vector_scalar based on lconst/rconst
+```
+
+Example (multi/variadic arguments):
+
+```cpp
+Columns cols(arguments.size());
+std::unique_ptr<bool[]> col_const = std::make_unique<bool[]>(arguments.size());
+for (int i = 0; i < arguments.size(); ++i) {
+ std::tie(cols[i], col_const[i]) =
unpack_if_const(block.get_by_position(arguments[i]).column);
+}
+// Use col_const[i] to decide vector_scalar vs vector_vector and avoid
unnecessary expansion
+```
+
+Example (with parameter-column optimization):
+
+```cpp
+bool col_const[3];
+ColumnPtr argument_columns[3];
+for (int i = 0; i < 3; ++i) {
+ col_const[i] =
is_column_const(*block.get_by_position(arguments[i]).column);
+}
+argument_columns[0] = col_const[0]
+ ? static_cast<const
ColumnConst&>(*block.get_by_position(arguments[0]).column).convert_to_full_column()
+ : block.get_by_position(arguments[0]).column;
+
+default_preprocess_parameter_columns(argument_columns, col_const, {1, 2},
block, arguments);
+if (col_const[1] && col_const[2]) {
+ execute_impl_scalar_args(...);
+} else {
+ execute_impl(...);
+}
+```
+
+#### Using index_check_const
+
+- Purpose: When accessing inputs that may be constant columns, normalize the
row index to 0 for those constant inputs to avoid branching/expansion.
+- Definition: `be/src/vec/columns/column_const.h`
+- Usage forms:
+ - Runtime-boolean form:
+
+ ```cpp
+ // is_const is a runtime boolean (e.g., from unpack_if_const or
col_const[i])
+ size_t idx = index_check_const(i, is_const);
+ auto ref = col->get_data_at(idx);
+ ```
+
+ - Compile-time boolean form (template):
+
+ ```cpp
+ // str_const/len_const must be compile-time constant template params
+ const auto idx = index_check_const<str_const>(i);
+ StringRef v = str_col->get_data_at(idx);
+ ```
+
+- Note: The template works for integral indices; source comments note possible
performance impact on some GCC targets. For extremely hot paths, consider
branching once before a tight loop.
+
+### Nullability & Error Handling
+
+- Nullability follows the return type and framework decisions; commonly use
`PropagateNullable`; when needed, use `DataTypeNullable`/`ColumnNullable`.
+- Common helpers: `make_nullable`, `remove_nullable`, `is_column_nullable`,
`is_column_const`.
+- Error handling uses the `Status` system and macros: `RETURN_IF_ERROR`,
`DORIS_TRY`, `WARN_IF_ERROR` (see `be/src/common/status.h`).
+
+- BE default NULL handling (important):
`use_default_implementation_for_nulls()` defaults to `true`:
+ - If any input is a constant NULL, return a constant NULL.
+ - If any inputs are `Nullable`, execute on nested columns; the framework
wraps the result with the union null-map as `Nullable`.
+ - In this mode, `get_return_type_impl` must return the bare type; the
framework decides whether to wrap Nullable.
+- When to set `use_default_implementation_for_nulls()` to `false`: when custom
NULL semantics are required or you must read/write the null map explicitly
(e.g., `CASE`/`IN`/`IS NULL`/some `JSON`/`CAST`).
+- `need_replace_null_data_to_default()`: under default NULL handling, sanitize
nested data on NULL rows to type defaults after execution (avoid issues like
arithmetic overflow).
+
+### Performance Notes (Hot Path)
+
+- Write directly to output column memory; avoid temporary copies; use
`reserve/resize` appropriately.
+- Avoid virtual calls and repeated dispatch in loops; use templates to
eliminate type checks.
+- Abstract/reuse existing implementations; avoid duplication and branching
complexity.
+
+## Testing & Quality Assurance
+
+- Regression tests: test types/volume not less than templates (e.g.,
`test_template_{X}_arg(s).groovy`).
+- Input coverage: datatype boundaries, null/not-null combinations,
exceptions/out-of-range, encoding/case, etc.
+- FE constant folding: cases with `NullLiteral` must match
`PropagateNullLiteral` semantics (e.g., `Acos(NULL)` → `NULL`).
+- BE unit tests (UT): use `check_function_all_arg_comb` to cover const
combinations and column layout variants; cover normal/exception/boundary inputs.
+- Scripts: `run-regression-test.sh`, `run-be-ut.sh` (see details inside
scripts).
Review Comment:
The documentation references specific script files but doesn't provide their
locations. Consider adding the full paths to these scripts for better guidance.
```suggestion
- Scripts: `regression-test/run-regression-test.sh`, `be/run-be-ut.sh` (see
details inside scripts).
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]