cloud-fan commented on PR #49955:
URL: https://github.com/apache/spark/pull/49955#issuecomment-2705416036

   @peter-toth  Ideally recursive CTE should stop if the last iteration 
generates no data. Pushing down the LIMIT and applying an early stop is an 
optimization and should not change the query result.
   
   With the above principle in mind, let's look at a concrete example: A 
recursive CTE that generates one row RDD (single partition) at each iteration, 
and the 100th iteration generates no data to stop the loop. With global limit 
10, we can stop at the 10th iteration, as we have generated enough records. 
With local limit 10, we can't early stop and still need to wait until the 100th 
iteration, which at the end returns a union RDD with 100 partition and each 
partition has one row.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to