peter-toth commented on PR #49955: URL: https://github.com/apache/spark/pull/49955#issuecomment-2706641950
> However, we may push down local limit without global limit and at the end they can be very far away. I think we disagree a bit here. While the above is true, a `LocalLimit(n)` node's purpose remains the same, to get at least `n` number of rows if there are `n` or more rows available in the input, but it doesn't need to return more. It is just an implementation detail that it limits each partition to max `n` rows to keep the node cheap and make it possible to push it down planning time. As `UnionLoop` can use the limit runtime, I think it is safe to use the limit for stopping the loop. Do you think you can share a counter example? Moreover, as locallimit is more likely to be closer to an UnionLoop node than globallimit (locallimit can be pushed down), so the chance that we can stop an infinite loop is higher and the above suggested `spark.sql.cteRecursionRowLimit` is less likely to kick in... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org