github-actions[bot] commented on code in PR #63763:
URL: https://github.com/apache/doris/pull/63763#discussion_r3311121242
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/AggScalarSubQueryToWindowFunction.java:
##########
@@ -278,6 +289,12 @@ private boolean checkRelation(List<Slot> correlatedSlots) {
.filter(node -> outerIds.contains(node.getTable().getId()))
.map(LogicalRelation.class::cast)
.map(LogicalRelation::getOutputExprIdSet).flatMap(Collection::stream).collect(Collectors.toSet());
+ partitionBySlots.addAll(apply.left().getOutput().stream()
Review Comment:
This still does not distinguish duplicate outer-only rows when their
distinguishing columns are not present in `apply.left().getOutput()`. For
example, if `dim` has two rows with the same `k` and the outer query only
outputs/uses `d.k` (no `d.did` or other unique column), the original scalar
subquery is evaluated once per `dim` row, but this code partitions the window
only by the visible `d.k`. The joined inner rows for both duplicate `dim` rows
then land in the same window partition and the aggregate is multiplied, so
predicates such as `f.v * 2 > (select sum(f2.v) ... where f2.k = d.k)` can
incorrectly filter out rows. The new regression includes `d.did` in the select
list, which makes this code include a distinguishing slot and misses this case.
Please either carry/partition by all slots from the outer-only relation needed
to preserve row identity, or make the rule return false when
`apply.left().getOutput()` does not contain the full outer-only relation output.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]