Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

via GitHub Fri, 07 Mar 2025 06:51:00 -0800


peter-toth commented on PR #49955:
URL: https://github.com/apache/spark/pull/49955#issuecomment-2706641950


   > However, we may push down local limit without global limit and at the end 
they can be very far away.
   
   I think we disagree a bit here. While the above is true, a `LocalLimit(n)` 
node's purpose remains the same, to get at least `n` number of rows if there 
are `n` or more rows available in the input, but it doesn't need to return 
more. It is just an implementation detail that it limits each partition to max 
`n` rows to keep the node cheap and make it possible to push it down planning 
time. As `UnionLoop` can use the limit runtime, I think it is safe to use the 
limit for stopping the loop. Do you think you can share a counter example?
   
   Moreover, as locallimit is more likely to be closer to an UnionLoop node 
than globallimit (locallimit can be pushed down), so the chance that we can 
stop an infinite loop is higher and the above suggested 
`spark.sql.cteRecursionRowLimit` is less likely to kick in...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries [spark]

Reply via email to