vladimirg-db opened a new pull request, #49937: URL: https://github.com/apache/spark/pull/49937
### What changes were proposed in this pull request? Fix correctness with UNION/EXCEPT/INTERSECT inside a view or `EXECUTE IMMEDIATE`. In the following examples the SQL Parser considers UNION/EXCEPT/INTERSECT keywords as aliases and drops the rest of the query: ``` spark.sql("CREATE OR REPLACE VIEW v1 AS SELECT 1 AS col1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4") spark.sql("SELECT * FROM v1").show() spark.sql("SELECT * FROM v1").queryExecution.analyzed spark.sql("CREATE OR REPLACE VIEW v1 AS SELECT 1 AS col1 EXCEPT SELECT 2 EXCEPT SELECT 1 EXCEPT SELECT 2") spark.sql("SELECT * FROM v1").show() spark.sql("SELECT * FROM v1").queryExecution.analyzed spark.sql("CREATE OR REPLACE VIEW t1 AS SELECT 1 AS col1 INTERSECT SELECT 1 INTERSECT SELECT 2 INTERSECT SELECT 2") spark.sql("SELECT * FROM v1").show() spark.sql("SELECT * FROM v1").queryExecution.analyzed spark.sql("DECLARE v INT") spark.sql("EXECUTE IMMEDIATE 'SELECT 1 UNION SELECT 2 UNION SELECT 3' INTO v") spark.sql("EXECUTE IMMEDIATE 'SELECT 1 UNION SELECT 2 UNION SELECT 3' INTO v").queryExecution.analyzed spark.sql("SELECT v").show() ```     There's no correctness issue associated with regular queries (without the `VIEW` or `EXECUTE IMMEDIATE`). Apparently that's because we use `ParserInterface.parsePlan` (`singleStatement` term in Spark SQL grammar) for [regular queries](https://github.com/apache/spark/blob/b968ce1d3ac1b72019b30bf3d4e11d9574ba1205/sql/core/src/main/scala/org/apache/spark/sql/classic/SparkSession.scala#L490) and `ParserInterface.parseQuery` (`query` term in Spark SQL grammar) for [view bodies](https://github.com/apache/spark/blob/b968ce1d3ac1b72019b30bf3d4e11d9574ba1205/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala#L986) and [EXECUTE IMMEDIATE with INTO](https://github.com/apache/spark/blob/b968ce1d3ac1b72019b30bf3d4e11d9574ba1205/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/executeImmediate.scala#L167). The difference is that `singleStatement` [ends in EOF](https://github.com/apache/spark/blob/b968ce1d3ac1b72019b30bf3d4e11d9574ba1205/sql/ api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4#L144). Not sure what's the actual root cause, because I don't know much about the SQL Parser. ### Why are the changes needed? Correctness issue fix. ### Does this PR introduce _any_ user-facing change? Yes, the results of queries on top of aforementioned views are gonna be correct. ### How was this patch tested? New `view-correctness` suite. ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org