milanisvet commented on code in PR #49571: URL: https://github.com/apache/spark/pull/49571#discussion_r1932566218
########## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ########## @@ -848,6 +848,15 @@ object LimitPushDown extends Rule[LogicalPlan] { case LocalLimit(exp, u: Union) => LocalLimit(exp, u.copy(children = u.children.map(maybePushLocalLimit(exp, _)))) + // If limit node is present, we should propagate it down to UnionLoop, so that it is later + // propagated to UnionLoopExec. + // Limit node is constructed by placing GlobalLimit over LocalLimit (look at Limit apply method) + // that is the reason why we match it this way. + case g @ GlobalLimit(IntegerLiteral(limit), l @ LocalLimit(_, p @ Project(_, ul: UnionLoop))) => Review Comment: Playing out with this now, I saw the following problem: This query: ``` WITH RECURSIVE t(n) AS ( VALUES (1) UNION ALL SELECT n+1 FROM t) SELECT * FROM t LIMIT 10 ``` will return numbers in the following order: 1, 10, 2, 3, 4, 5, 6, 7, 8, 9 Somehow, the order is not 1, 2, 3, 4.... The correct result with an OFFSET 5 should be 6, 7, ..., 15 (postgres) So I think allowing OFFSET, it is not just enough to retain it in the outside query and apply it from there, but would require a similar approach as passing down the limit to UnionLoopExec and then adding some additional logic there. So maybe I would rather throw an error now if somebody try to use it, and leave the implementation for a follow-up PR. For now, in case OFFSET is used it will just perform an infinite loop and throw `[RECURSION_LEVEL_LIMIT_EXCEEDED]`, but I will try to add a specific error for this case -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org