milanisvet commented on code in PR #49571:
URL: https://github.com/apache/spark/pull/49571#discussion_r1932566218


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala:
##########
@@ -848,6 +848,15 @@ object LimitPushDown extends Rule[LogicalPlan] {
     case LocalLimit(exp, u: Union) =>
       LocalLimit(exp, u.copy(children = 
u.children.map(maybePushLocalLimit(exp, _))))
 
+    // If limit node is present, we should propagate it down to UnionLoop, so 
that it is later
+    // propagated to UnionLoopExec.
+    // Limit node is constructed by placing GlobalLimit over LocalLimit (look 
at Limit apply method)
+    // that is the reason why we match it this way.
+    case g @ GlobalLimit(IntegerLiteral(limit), l @ LocalLimit(_, p @ 
Project(_, ul: UnionLoop))) =>

Review Comment:
   Playing out with this now, I saw the following problem:
   
   This query:
   ```
   WITH RECURSIVE t(n) AS (
       VALUES (1)
   UNION ALL
       SELECT n+1 FROM t)
   SELECT * FROM t LIMIT 10
   ```
   will return numbers in the following order: 1, 10, 2, 3, 4, 5, 6, 7, 8, 9
   
   Somehow, the order is not 1, 2, 3, 4....
   The correct result with an OFFSET 5 should be 6, 7, ..., 15 (postgres)
   
   So I think allowing OFFSET, it is not just enough to retain it in the 
outside query and apply it from there, but would require a similar approach as 
passing down the limit to UnionLoopExec and then adding some additional logic 
there.
   
   So maybe I would rather throw an error now if somebody try to use it, and 
leave the implementation for a follow-up PR.
   For now, in case OFFSET is used it will just perform an infinite loop and 
throw [RECURSION_LEVEL_LIMIT_EXCEEDED], but I will try to add a specific error 
for this case



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to