github-actions[bot] commented on PR #61352:
URL: https://github.com/apache/doris/pull/61352#issuecomment-4064558277
## Code Review Summary — PR #61352
**Scope**: This PR adds 4 new REGR aggregate functions (`REGR_AVGX`,
`REGR_AVGY`, `REGR_COUNT`, `REGR_R2`) and refactors the existing 5
(`REGR_SLOPE`, `REGR_INTERCEPT`, `REGR_SXX`, `REGR_SYY`, `REGR_SXY`) into a
unified template framework using `RegrFunctionKind` enum + `RegrTraits`
specializations.
**Minor note**: PR title contains a typo — "REGR_ARGX, REGR_ARGY" should be
"REGR_AVGX, REGR_AVGY".
### Critical Checkpoint Conclusions
1. **Goal achieved?** Yes. The PR adds 4 new statistical regression
functions and consolidates all 9 regr functions into a single unified template.
Regression tests cover all new functions.
2. **Modification minimal and focused?** Yes. The refactoring of existing
functions into the new template is well-motivated — it avoids duplicating the
add/merge/serialize logic for each function.
3. **Concurrency?** Not applicable — aggregate functions are per-partition
state with no shared mutable data.
4. **Lifecycle / static initialization?** No concerns — no static state, no
cross-TU dependencies.
5. **Configuration items?** None added.
6. **Incompatible changes / serialization compatibility?** **Verified
safe.** The serialization format (field order: Sx, Sy, Sxx, Syy, Sxy, N, each
conditional) is identical between old `aggregate_function_regr_union.h` and new
`aggregate_function_regr.h` for all 5 existing functions. The `regr_intercept`
intentionally keeps `sy_level=2` to preserve its historical serialized state
layout (documented in comment at line 91). Rolling upgrades are safe.
7. **Parallel code paths?** The old `aggregate_function_regr_union.h/.cpp`
is fully deleted and replaced. No stale parallel paths remain.
8. **Special conditional checks?** The `regr_r2` `get_result()` has
three-branch logic matching the SQL standard: NULL when N=0 or VAR_POP(x)=0,
1.0 when VAR_POP(y)=0, otherwise Sxy²/(Sxx·Syy). This is mathematically correct
and matches PostgreSQL behavior. The `regr_count` state (`k_num_moments=0`)
correctly compiles down to only tracking `n`.
9. **Test coverage?** Good. Each of the 4 new functions has a dedicated
regression test with:
- Empty table (NULL/0), group by on empty table
- Integer and double inputs
- Group by with multiple groups
- Non-nullable path (`non_nullable()`)
- Literal arguments
- All-null groups (NULL/0)
- Error cases (string params, array params) using `test{exception}`
pattern
- `order_qt_` prefixed queries and `ORDER BY` for deterministic output
- Tables dropped before use
10. **FE-BE consistency?** Verified:
- FE: `RegrAvgx/RegrAvgy/RegrR2` → `AlwaysNullable`, return `DoubleType`
→ BE returns nullable Float64. Consistent.
- FE: `RegrCount` → `NotNullableAggregateFunction` with
`resultForEmptyInput()=0`, return `BigIntType` → BE returns non-nullable Int64
via `regr_count` specialization. Consistent.
- All 4 new functions registered in `BuiltinAggregateFunctions.java`,
`FunctionSet.java`, and BE's `register_aggregate_function_regr()`.
- All 4 visitor methods added to `AggregateFunctionVisitor.java`,
delegating to `visitAggregateFunction()` — consistent with existing regr
functions.
11. **Observability?** Not needed — aggregate functions don't require
additional metrics or logging.
12. **Transaction/persistence?** Not applicable.
13. **Performance?** The template design with `if constexpr` compiles away
unused moment tracking (e.g., `regr_count` only tracks N with zero overhead).
The online Welford/Youngs-Cramer algorithms are numerically stable and O(1) per
row. No redundant copies or allocations. The use of `std::tuple<UInt64,
std::array<Float64, k_num_moments>>` vs `std::tuple<UInt64>` for `regr_count`
is a nice touch that minimizes state size.
14. **Other issues?** None found. The code follows existing patterns, the FE
Java classes follow the newer style (consistent with RegrSxx/RegrSyy/RegrSxy
from the related PR #55940), and all mathematical formulas are correct.
### Verdict: **No issues found.** Clean, well-structured addition with good
test coverage and backward-compatible serialization.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]