The Babel parser is a compromise. It tolerates ambiguity but occasionally accepts strings thst are invalid, and it makes the JavaCC engine work harder.
I started a jira case and dev branch to upgrade JavaCC. I never completed it, but I recall that I solved many of the ambiguities in the grammar and maybe I was able to change the global lookahead to 1. See whether you can use that work. Another option is to create your own parser variant for just the Pinot dialect, using Calcites core parser and the fmpp template mechanism. > On Oct 28, 2024, at 9:28 AM, Bolek Ziobrowski > <boleslaw.ziobrow...@startree.ai.invalid> wrote: > > Hello everyone. > My name is Bolek and I work at StarTree. > Recently I've been trying to improve error reporting of Apache Pinot's sql > parser, which is a slightly customized version of Calcite's babel parser. > I noticed that in many cases (probably most) the error information > (position, token and list of tokens) is wrong. > For instance, sql command such as: > > WITH grouping AS (SELECT 1) select * from grouping; > > produces the following useless message: > org.apache.pinot.sql.parsers.SqlCompilationException: > Caught exception while parsing query: WITH grouping AS (SELECT 1) select * > from grouping > Caused by: org.apache.pinot.sql.parsers.parser.ParseException: > Encountered "" at line 1, column 1. Was expecting one of: (empty) > > Oftentimes, the reported error position and token are 1 token earlier than > they should be and the list of expected productions is hundreds of > items-long. > > The issue seems to be caused by using global lookahead value 2 while having > a long list of nonReservedKeywords. > Once I switched lookahead to 1, javacc maven plugin started to emit a large > number of conflicts between regular productions and nonReservedKeywords. > I managed to fix those by adding lots of LOOKAHEADS to the grammar (as can > be seen at > https://github.com/apache/pinot/pull/14238/files#diff-5de5043229de15ff630c4920d392a058098fa3f54793df4799734c0a4f908732 > ) > but that makes the grammar harder to keep in sync with Calcite's . > Has anyone worked on a similar issue or could suggest a better approach ? > If the approach makes sense, would Calcite be open to similar change to the > grammar ? > Please let me know if there's a better place for discussing such issues. > > Best regards, > Bolek Ziobrowski