Hi Hive devs, I would like to share with you Spark's plan w.r.t. SQL parser going forward. As you may (or may not) know, Spark SQL has had two parsers so far:
- a very simple one based on Scala's parser combinator; and - one that depends on Hive's The Scala parser combinator one was written quickly so we could parse SQL queries even when Hive dependency is off. However, it suffers from some major problems, the most important of which are (1) really bad error messages and (2) no warning when grammars rules conflict. We really like the Hive parser. It calls into Hive itself and translates the generated AST into Spark's logical plans. However, because the grammar definition was not in Spark, we could not introduce new grammars or fix bugs when needed. These two parsers have been a major source of confusions for Spark users, because depending on which mode Spark SQL is running on, you get subtle differences in grammar. It has been our intention to replace both of them with a built-in parser. We have looked into various options, and it looks like the best option is to copy the ANTLR grammar file from Hive into Spark. Because the grammar file is tightly coupled with Hive's semantic analysis, we need to refactor some code to use them so it will end up becoming the .g file plus some coupled code. We already have a prototype that somewhat works. I expect we will get this done in early 2016. We have also looked into creating an independent library for the SQL parser that both Hive and Spark share. However, we eventually decided that it wouldn't make much sense with this approach, because it is a lot of work for both Hive and Spark to refactor existing code to introduce an external parser. From Hive's perspective this does not provide any immediate benefits. From Spark's perspective, we iterate very quickly so having to depend on an external component also slow down our development. We also have some requirements that simply don't apply in other projects (e.g. being able to parse DataFrame expressions). Thanks a lot for developing this parser, and we will try our best to contribute back as we fix bugs. I will also make sure we have the proper acknowledgment when we do this. Cheers. - Reynold