cloud-fan commented on PR #49955:
URL: https://github.com/apache/spark/pull/49955#issuecomment-2706427848

   > I think LocalLimit(n)'s purpose is to provide a cheap max n row limiter.
   
   We don't have a user-facing API for local limit and local limit is always 
generated from a global limit, so this assumption might be true. However, we 
may push down local limit without global limit and at the end they can be very 
far away. To guarantee correctness I think it's better to respect what the 
local limit node does: doing a local limit on each RDD partition. Given this 
behavior, I don't think we can early stop the recursive CTE loop which means 
generating less RDD partitions in the final result.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to