github-actions[bot] commented on code in PR #64820:
URL: https://github.com/apache/doris/pull/64820#discussion_r3471798543
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/eageraggregation/PushDownAggregation.java:
##########
@@ -143,14 +141,16 @@ public Plan visitLogicalAggregate(LogicalAggregate<?
extends Plan> agg, JobConte
if (aggFunction.containsVolatileExpression()) {
return agg;
}
- // CaseWhen and If (which CASE WHEN is normalized into)
must both be checked.
- // When an agg function contains an If/CaseWhen whose
condition tests IS NULL
- // (e.g. count(if(col IS NULL, value, NULL))), pushing it
to the nullable side
- // of an outer join produces wrong results: null-extended
rows make "col IS NULL"
- // TRUE at the top level, but the pre-aggregated count
slot becomes NULL after
- // null-extension, and ifnull(sum(NULL), 0) = 0 instead of
the correct 1.
- if (!hasCaseWhen && aggFunction.anyMatch(e -> e instanceof
CaseWhen || e instanceof If)) {
- hasCaseWhen = true;
+ // NullToNonNullFunction: expressions that can convert
NULL input to non-NULL output
+ // (e.g. COALESCE, NVL, IF, CASE WHEN, NULL_OR_EMPTY,
NOT_NULL_OR_EMPTY).
+ // When an agg function contains such an expression
wrapping a column from the
+ // nullable side of an outer join, null-extended rows
would produce non-NULL values
+ // that get counted by the aggregation. But the
pre-aggregation on the base table
Review Comment:
The marker check still misses null-safe predicates that have the same
outer-join problem. For example:
```text
Aggregate(group by t1.id1, output=count(t2.name IS NULL))
LeftOuterJoin(t1.id1 = t2.id2)
t1
t2
```
`IsNull` is `AlwaysNotNullable` but does not implement
`NullToNonNullFunction`, so `containsNullToNonNull` stays false here. Because
`count(t2.name IS NULL)` has a right-side input slot,
`adjustPushSideForNullable` keeps pushing to the right side of the left join.
The lower aggregate on `t2` cannot see unmatched left rows, so after null
extension the upper `sum0(count_slot)` contributes 0, while the original
expression evaluates `t2.name IS NULL` to TRUE and `count(TRUE)` contributes 1.
`NullSafeEqual` has the same shape for expressions like `t2.name <=> NULL`.
Please include these null-safe predicates in the guard (or use a conservative
helper for null-to-non-null expressions) before allowing nullable-side pushdown.
##########
fe/fe-core/src/test/java/org/apache/doris/nereids/rules/rewrite/eageraggregation/EagerAggRewriterTest.java:
##########
@@ -272,10 +272,10 @@ void testNotPushCountIfIsNullToNullableSideOfOuterJoin() {
void testNotPushCountCaseWhenWithElseToNullableSideViaProject() {
// When CASE WHEN has an ELSE clause (e.g. CASE WHEN cond THEN -121
ELSE 2 END),
// NormalizeAggregate extracts if(cond, -121, 2) into a Project as a
slot,
- // so the aggregate becomes count(#slot). At the agg level,
hasCaseWhen=false
+ // so the aggregate becomes count(#slot). At the agg level,
containsNullToNonNull=false
Review Comment:
This PR broadens the guard from the previous If/CaseWhen cases to several
newly marked `NullToNonNullFunction` implementations, but the test change here
only renames comments. There is still no eager-aggregation outer-join test
proving that `count(coalesce(nullable_side_col, ...))`, `count(ifnull(...))`,
`count(null_or_empty(...))`, etc. are not pushed below the nullable side,
including the project-substitution path that this test is describing. Please
add at least one plan-shape regression for a newly marked function so this
contract does not silently regress while future functions are added to the
marker.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]