vladimirg-db opened a new pull request, #49937:
URL: https://github.com/apache/spark/pull/49937

   ### What changes were proposed in this pull request?
   
   Fix correctness with UNION/EXCEPT/INTERSECT inside a view or `EXECUTE 
IMMEDIATE`.
   
   In the following examples the SQL Parser considers UNION/EXCEPT/INTERSECT 
keywords as aliases and drops the rest of the query:
   
   ```
   spark.sql("CREATE OR REPLACE VIEW v1 AS SELECT 1 AS col1 UNION SELECT 2 
UNION SELECT 3 UNION SELECT 4")
   spark.sql("SELECT * FROM v1").show()
   spark.sql("SELECT * FROM v1").queryExecution.analyzed
   
   spark.sql("CREATE OR REPLACE VIEW v1 AS SELECT 1 AS col1 EXCEPT SELECT 2 
EXCEPT SELECT 1 EXCEPT SELECT 2")
   spark.sql("SELECT * FROM v1").show()
   spark.sql("SELECT * FROM v1").queryExecution.analyzed
   
   spark.sql("CREATE OR REPLACE VIEW t1 AS SELECT 1 AS col1 INTERSECT SELECT 1 
INTERSECT SELECT 2 INTERSECT SELECT 2")
   spark.sql("SELECT * FROM v1").show()
   spark.sql("SELECT * FROM v1").queryExecution.analyzed
   
   spark.sql("DECLARE v INT")
   spark.sql("EXECUTE IMMEDIATE 'SELECT 1 UNION SELECT 2 UNION SELECT 3' INTO 
v")
   spark.sql("EXECUTE IMMEDIATE 'SELECT 1 UNION SELECT 2 UNION SELECT 3' INTO 
v").queryExecution.analyzed
   spark.sql("SELECT v").show()
   ```
   
   
![image](https://github.com/user-attachments/assets/ef726178-2375-4ebc-a7e3-88f1991d1016)
   
![image](https://github.com/user-attachments/assets/50b4b7ba-bc7d-4fc1-a921-f4cbfcab79a3)
   
![image](https://github.com/user-attachments/assets/85b65325-5dd9-4d74-b46d-8ea203ce1039)
   
![image](https://github.com/user-attachments/assets/c53c5e02-18c6-4e30-b834-94af619190c5)
   
   There's no correctness issue associated with regular queries (without the 
`VIEW` or `EXECUTE IMMEDIATE`). Apparently that's because we use 
`ParserInterface.parsePlan` (`singleStatement` term in Spark SQL grammar) for 
[regular 
queries](https://github.com/apache/spark/blob/b968ce1d3ac1b72019b30bf3d4e11d9574ba1205/sql/core/src/main/scala/org/apache/spark/sql/classic/SparkSession.scala#L490)
 and `ParserInterface.parseQuery` (`query` term in Spark SQL grammar) for [view 
bodies](https://github.com/apache/spark/blob/b968ce1d3ac1b72019b30bf3d4e11d9574ba1205/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala#L986)
 and [EXECUTE IMMEDIATE with 
INTO](https://github.com/apache/spark/blob/b968ce1d3ac1b72019b30bf3d4e11d9574ba1205/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/executeImmediate.scala#L167).
 The difference is that `singleStatement` [ends in 
EOF](https://github.com/apache/spark/blob/b968ce1d3ac1b72019b30bf3d4e11d9574ba1205/sql/
 
api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4#L144).
   
   Not sure what's the actual root cause, because I don't know much about the 
SQL Parser.
   
   ### Why are the changes needed?
   
   Correctness issue fix.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, the results of queries on top of aforementioned views are gonna be 
correct.
   
   ### How was this patch tested?
   
   New `view-correctness` suite.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to