github-actions[bot] commented on code in PR #64633:
URL: https://github.com/apache/doris/pull/64633#discussion_r3466290098
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/SplitLimit.java:
##########
@@ -38,8 +40,13 @@ public Rule build() {
.then(limit -> {
long l = limit.getLimit();
long o = limit.getOffset();
+ if (Utils.addOverflows(l, o)) {
Review Comment:
Throwing here changes valid SQL into an analysis error. For a plan like
`LogicalLimit(limit=Long.MAX_VALUE, offset=Long.MAX_VALUE) -> UnionAll(1,2,3)`,
the offset just skips every available row, so `count(*)` should return `0`; the
query is not invalid. `SplitLimit` is only a rewrite, so on overflow it should
either leave the limit unsplit or use an unbounded local phase such as
`Long.MAX_VALUE` and let the global limit/offset apply. The new regression test
currently expects this exception, so it codifies the regression instead of
preserving the empty-result semantics.
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/implementation/LogicalTopNToPhysicalTopN.java:
##########
@@ -46,13 +48,30 @@ public Rule build() {
* mergeTopN(limit, off, require gather) -> localTopN(off+limit, 0,
require any)
*/
private List<PhysicalTopN<? extends Plan>> twoPhaseSort(LogicalTopN<?
extends Plan> logicalTopN) {
- PhysicalTopN<Plan> localSort = new
PhysicalTopN<>(logicalTopN.getOrderKeys(),
- logicalTopN.getLimit() + logicalTopN.getOffset(), 0,
SortPhase.LOCAL_SORT,
- logicalTopN.getLogicalProperties(), logicalTopN.child(0));
int sortPhaseNum = 0;
if (ConnectContext.get() != null) {
sortPhaseNum =
ConnectContext.get().getSessionVariable().sortPhaseNum;
}
+ // The two-phase local sort keeps limit + offset rows. When that
overflows the long range
+ // (e.g. LIMIT and OFFSET both BIGINT_MAX), only the single-phase
gather sort is safe: it
+ // applies limit and offset separately and cannot overflow. The
two-phase MERGE_SORT cannot be
+ // used because ChildrenPropertiesRegulator forbids inserting a gather
Exchange under
+ // GATHER_SORT, so the optimizer cannot produce a valid distributed
plan with only one-phase.
+ // If sort_phase_num != 1 (i.e. the user did not explicitly request
single-phase), report the
+ // error so the user knows to set sort_phase_num = 1 for this extreme
boundary.
+ if (Utils.addOverflows(logicalTopN.getLimit(),
logicalTopN.getOffset())) {
+ if (sortPhaseNum != 1) {
Review Comment:
This still turns a valid query into a user-visible error under the default
session. With `sort_phase_num` unset, `sortPhaseNum` is `0`, so `ORDER BY k
LIMIT Long.MAX_VALUE OFFSET Long.MAX_VALUE` reaches this branch and throws even
though the same method can build the semantic single-phase `GATHER_SORT` below.
The default planner should keep a legal physical alternative, for example by
returning the single-phase gather sort on overflow or by using a legal
two-phase shape whose local phase keeps all rows and whose global phase applies
the original limit/offset, rather than requiring users to know to set
`sort_phase_num = 1`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]