github-actions[bot] commented on code in PR #64820:
URL: https://github.com/apache/doris/pull/64820#discussion_r3478881990


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/eageraggregation/PushDownAggregation.java:
##########
@@ -143,14 +141,16 @@ public Plan visitLogicalAggregate(LogicalAggregate<? 
extends Plan> agg, JobConte
                     if (aggFunction.containsVolatileExpression()) {
                         return agg;
                     }
-                    // CaseWhen and If (which CASE WHEN is normalized into) 
must both be checked.
-                    // When an agg function contains an If/CaseWhen whose 
condition tests IS NULL
-                    // (e.g. count(if(col IS NULL, value, NULL))), pushing it 
to the nullable side
-                    // of an outer join produces wrong results: null-extended 
rows make "col IS NULL"
-                    // TRUE at the top level, but the pre-aggregated count 
slot becomes NULL after
-                    // null-extension, and ifnull(sum(NULL), 0) = 0 instead of 
the correct 1.
-                    if (!hasCaseWhen && aggFunction.anyMatch(e -> e instanceof 
CaseWhen || e instanceof If)) {
-                        hasCaseWhen = true;
+                    // NullToNonNullFunction: expressions that can convert 
NULL input to non-NULL output
+                    // (e.g. COALESCE, NVL, IF, CASE WHEN, NULL_OR_EMPTY, 
NOT_NULL_OR_EMPTY).
+                    // When an agg function contains such an expression 
wrapping a column from the
+                    // nullable side of an outer join, null-extended rows 
would produce non-NULL values
+                    // that get counted by the aggregation. But the 
pre-aggregation on the base table
+                    // cannot see null-extended rows (they are produced by the 
join), so the push-down
+                    // would lose those contributions — producing wrong 
results.
+                    if (!containsNullToNonNull
+                            && aggFunction.anyMatch(e -> e instanceof 
NullToNonNullFunction)) {

Review Comment:
   This marker-only check is still too narrow. Although the IP/default and 
null-safe examples are now marked, existing expressions such as 
`array(nullable_side_col)` can still turn a nullable-side NULL into a non-null 
result without implementing `NullToNonNullFunction`: Nereids marks `Array` as 
`AlwaysNotNullable`, and the BE array constructor opts out of default null 
propagation while returning an array with nullable nested elements, so 
`array(NULL)` is a non-null array containing a NULL element.
   
   For example, with a left outer join:
   
   ```text
   Aggregate(group by t1.id1, output=count(array(t2.name)))
     LeftOuterJoin(t1.id1 = t2.id2)
       t1
       t2
   ```
   
   `count(array(t2.name))` has a right-side input slot, so 
`adjustPushSideForNullable` still allows the push when this flag remains false. 
The lower aggregate over `t2` cannot see unmatched left rows; after null 
extension the rollup contributes `sum0(NULL)=0`, while the original expression 
evaluates `array(NULL)` and should count 1. Please make the guard cover this 
broader "nullable-side NULL becomes non-null expression" contract, for example 
by handling these always-not-null constructors or by using a shared helper 
instead of relying on each case being manually marked.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to