Pajaraja opened a new pull request, #50546: URL: https://github.com/apache/spark/pull/50546
### What changes were proposed in this pull request? Enabling the possibility of a CTE referencing the recursive CTE it is inside of. This is done by modifying the CTESubstitution file, consisting of two main parts: - If traverseAndSubstituteCTE is called from resolveCTERelations when attempting to resolve a recursive CTE to resolve all the CTEs it references, we remember this ancestor rCTE in case any of the child CTEs want to reference it. If we encounter another rCTE inside of the rCTE (which is only allowed in the anchor), we define it to be the new anchor rCTE. - Even though the first part is enough to resolve these CTEs, a new problem arises when trying to identify whether a CTE is recursive or not, since if CTE0 is recursive and CTE1 is a CTE inside CTE0 that references CTE0, the only way to tell whether CTE0 is recursive is to check inside CTE1. For this reason we decide to inline all non-recursive CTEs inside a recursive CTE so that CTE0 can see its self reference. ### Why are the changes needed? To make queries that self reference work. An example of such a query is: ``` WITH RECURSIVE t1 AS ( SELECT 1 AS n UNION ALL WITH t2 AS (SELECT n + 1 FROM t1 WHERE n < 5) SELECT * FROM t2 ) SELECT * FROM t1; ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing CTEs for this that didn't work before. ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org