github-actions[bot] commented on code in PR #64820:
URL: https://github.com/apache/doris/pull/64820#discussion_r3471798543


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/eageraggregation/PushDownAggregation.java:
##########
@@ -143,14 +141,16 @@ public Plan visitLogicalAggregate(LogicalAggregate<? 
extends Plan> agg, JobConte
                     if (aggFunction.containsVolatileExpression()) {
                         return agg;
                     }
-                    // CaseWhen and If (which CASE WHEN is normalized into) 
must both be checked.
-                    // When an agg function contains an If/CaseWhen whose 
condition tests IS NULL
-                    // (e.g. count(if(col IS NULL, value, NULL))), pushing it 
to the nullable side
-                    // of an outer join produces wrong results: null-extended 
rows make "col IS NULL"
-                    // TRUE at the top level, but the pre-aggregated count 
slot becomes NULL after
-                    // null-extension, and ifnull(sum(NULL), 0) = 0 instead of 
the correct 1.
-                    if (!hasCaseWhen && aggFunction.anyMatch(e -> e instanceof 
CaseWhen || e instanceof If)) {
-                        hasCaseWhen = true;
+                    // NullToNonNullFunction: expressions that can convert 
NULL input to non-NULL output
+                    // (e.g. COALESCE, NVL, IF, CASE WHEN, NULL_OR_EMPTY, 
NOT_NULL_OR_EMPTY).
+                    // When an agg function contains such an expression 
wrapping a column from the
+                    // nullable side of an outer join, null-extended rows 
would produce non-NULL values
+                    // that get counted by the aggregation. But the 
pre-aggregation on the base table

Review Comment:
   The marker check still misses null-safe predicates that have the same 
outer-join problem. For example:
   
   ```text
   Aggregate(group by t1.id1, output=count(t2.name IS NULL))
     LeftOuterJoin(t1.id1 = t2.id2)
       t1
       t2
   ```
   
   `IsNull` is `AlwaysNotNullable` but does not implement 
`NullToNonNullFunction`, so `containsNullToNonNull` stays false here. Because 
`count(t2.name IS NULL)` has a right-side input slot, 
`adjustPushSideForNullable` keeps pushing to the right side of the left join. 
The lower aggregate on `t2` cannot see unmatched left rows, so after null 
extension the upper `sum0(count_slot)` contributes 0, while the original 
expression evaluates `t2.name IS NULL` to TRUE and `count(TRUE)` contributes 1. 
`NullSafeEqual` has the same shape for expressions like `t2.name <=> NULL`. 
Please include these null-safe predicates in the guard (or use a conservative 
helper for null-to-non-null expressions) before allowing nullable-side pushdown.



##########
fe/fe-core/src/test/java/org/apache/doris/nereids/rules/rewrite/eageraggregation/EagerAggRewriterTest.java:
##########
@@ -272,10 +272,10 @@ void testNotPushCountIfIsNullToNullableSideOfOuterJoin() {
     void testNotPushCountCaseWhenWithElseToNullableSideViaProject() {
         // When CASE WHEN has an ELSE clause (e.g. CASE WHEN cond THEN -121 
ELSE 2 END),
         // NormalizeAggregate extracts if(cond, -121, 2) into a Project as a 
slot,
-        // so the aggregate becomes count(#slot). At the agg level, 
hasCaseWhen=false
+        // so the aggregate becomes count(#slot). At the agg level, 
containsNullToNonNull=false

Review Comment:
   This PR broadens the guard from the previous If/CaseWhen cases to several 
newly marked `NullToNonNullFunction` implementations, but the test change here 
only renames comments. There is still no eager-aggregation outer-join test 
proving that `count(coalesce(nullable_side_col, ...))`, `count(ifnull(...))`, 
`count(null_or_empty(...))`, etc. are not pushed below the nullable side, 
including the project-substitution path that this test is describing. Please 
add at least one plan-shape regression for a newly marked function so this 
contract does not silently regress while future functions are added to the 
marker.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to