github-actions[bot] commented on code in PR #64633:
URL: https://github.com/apache/doris/pull/64633#discussion_r3434307874
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/util/Utils.java:
##########
@@ -130,6 +130,25 @@ public static <T> boolean isIntersecting(Set<T> lhs,
Collection<T> rhs) {
return false;
}
+ /**
+ * Saturating addition that clamps on overflow instead of wrapping around.
+ *
+ * <p>This is mainly used when combining a {@code limit} and an {@code
offset} into the number
+ * of rows a child operator must produce (e.g. pushing a TopN/Limit down
through Union/Join/
+ * Window). When both values are close to {@code BIGINT_MAX} their sum
overflows the long range
+ * and wraps to a negative number, which is an illegal limit and produces
a broken plan. Since
+ * no relation can hold more than {@code Long.MAX_VALUE} rows, clamping to
{@code Long.MAX_VALUE}
+ * ("all rows") is the semantically correct upper bound and never drops
rows the parent may need.
+ */
+ public static long saturatedAdd(long a, long b) {
Review Comment:
This helper is the right primitive, but the PR does not route all equivalent
`limit + offset` row-count derivations through it. A plain TopN still follows
this path:
```text
LogicalTopN(limit=Long.MAX_VALUE, offset=Long.MAX_VALUE, order by id)
Scan/Union
```
`LogicalTopNToPhysicalTopN.twoPhaseSort` then builds a local child with
`logicalTopN.getLimit() + logicalTopN.getOffset()` in
`fe/fe-core/src/main/java/org/apache/doris/nereids/rules/implementation/LogicalTopNToPhysicalTopN.java:50`,
so the physical tree can still contain:
```text
PhysicalTopN(MERGE_SORT limit=MAX offset=MAX)
PhysicalTopN(LOCAL_SORT limit=-2 offset=0)
```
With `SET sort_phase_num = 2` this is not just an unused alternative. The
same raw computation also remains in `PushDownLimit.java`,
`PushDownLimitDistinctThroughJoin.java`,
`PushDownLimitDistinctThroughUnion.java`, `LimitAggToTopNAgg.java`,
`PushTopnToAgg.java`, the score/vector TopN scan pushdowns, and
`PhysicalPlanTranslator#setSortLimit`. Please replace these parallel derived
child/scan/agg limits, or centralize the operation on `Limit`/`TopN`, otherwise
the same negative limit can still reach BE outside the specific Union rewrite
covered here.
##########
regression-test/suites/nereids_rules_p0/push_down_top_n/push_down_top_n_through_union.groovy:
##########
@@ -312,4 +312,19 @@ suite("push_down_top_n_through_union") {
ORDER BY logTimestamp desc
LIMIT 8;
"""
+
+ // When pushing TopN through UNION ALL, the child TopN's limit is computed
as limit + offset.
+ // With LIMIT and OFFSET both at BIGINT_MAX the unguarded addition
overflowed the long range and
+ // wrapped to a negative limit, producing a broken plan that timed out.
The offset is far larger
+ // than the 3 input rows, so the correct result is an empty set. This must
return 0 quickly.
+ def topnUnionOverflowRes = sql """
+ select count(*) as c from (
+ select id from (
+ select 1 as id union all select 2 as id union all select 3 as
id
+ ) t
+ order by id limit 9223372036854775807 offset 9223372036854775807
+ ) s;
+ """
+ assertEquals(1, topnUnionOverflowRes.size())
+ assertEquals(0L, topnUnionOverflowRes[0][0])
Review Comment:
This deterministic result should be recorded with a `qt_...` query and the
generated `.out` file, not Groovy assertions. The local regression-test rules
require determined expected results to be generated through `qt_sql`/`order_qt`
style output files; here `count(*)` always returns one row with `0`, but that
expected result is invisible in
`regression-test/data/nereids_rules_p0/push_down_top_n/push_down_top_n_through_union.out`.
Please convert this to a `qt_topn_union_overflow` block and regenerate the
`.out` file.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]