milanisvet commented on code in PR #49571:
URL: https://github.com/apache/spark/pull/49571#discussion_r1930549621


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala:
##########
@@ -848,6 +848,13 @@ object LimitPushDown extends Rule[LogicalPlan] {
     case LocalLimit(exp, u: Union) =>
       LocalLimit(exp, u.copy(children = 
u.children.map(maybePushLocalLimit(exp, _))))
 
+    // If limit node is present, we should propagate it down to UnionLoop, so 
that it is later
+    // propagated to UnionLoopExec.
+    case l @ LocalLimit(IntegerLiteral(limit), p @ Project(_, u: UnionLoop)) =>

Review Comment:
   Changed in the following way:
   1. `GlobalLimit` is matched in optimizer and pushed down
   2. Global limit (`Limit`) node is prepended to the plan in `UnionLoopExec` 
in case `limit` is passed down and not `None`
   3. `Limit` is not prepended to the plan in case no limit is passed down 
(`limit` is `None`) to prevent unnecessary shuffle due to `GlobalLimit`
   
   Just a side note here: `Limit` node always creates `GlobalLimit` on top of 
`LocalLimit`



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala:
##########
@@ -848,6 +848,13 @@ object LimitPushDown extends Rule[LogicalPlan] {
     case LocalLimit(exp, u: Union) =>
       LocalLimit(exp, u.copy(children = 
u.children.map(maybePushLocalLimit(exp, _))))
 
+    // If limit node is present, we should propagate it down to UnionLoop, so 
that it is later
+    // propagated to UnionLoopExec.
+    case l @ LocalLimit(IntegerLiteral(limit), p @ Project(_, u: UnionLoop)) =>

Review Comment:
   Changed now in the following way after rethinking again:
   1. `GlobalLimit` is matched in optimizer and pushed down
   2. Global limit (`Limit`) node is prepended to the plan in `UnionLoopExec` 
in case `limit` is passed down and not `None`
   3. `Limit` is not prepended to the plan in case no limit is passed down 
(`limit` is `None`) to prevent unnecessary shuffle due to `GlobalLimit`
   
   Just a side note here: `Limit` node always creates `GlobalLimit` on top of 
`LocalLimit`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to