cloud-fan commented on PR #49955: URL: https://github.com/apache/spark/pull/49955#issuecomment-2705416036
@peter-toth Ideally recursive CTE should stop if the last iteration generates no data. Pushing down the LIMIT and applying an early stop is an optimization and should not change the query result. With the above principle in mind, let's look at a concrete example: A recursive CTE that generates one row RDD (single partition) at each iteration, and the 100th iteration generates no data to stop the loop. With global limit 10, we can stop at the 10th iteration, as we have generated enough records. With local limit 10, we can't early stop and still need to wait until the 100th iteration, which at the end returns a union RDD with 100 partition and each partition has one row. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org