cloud-fan commented on PR #49955: URL: https://github.com/apache/spark/pull/49955#issuecomment-2706427848
> I think LocalLimit(n)'s purpose is to provide a cheap max n row limiter. We don't have a user-facing API for local limit and local limit is always generated from a global limit, so this assumption might be true. However, we may push down local limit without global limit and at the end they can be very far away. To guarantee correctness I think it's better to respect what the local limit node does: doing a local limit on each RDD partition. Given this behavior, I don't think we can early stop the recursive CTE loop which means generating less RDD partitions in the final result. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org