Thank you very much for the quick response, Julian!
The parse will be broken if we simply set LOOKAHEAD to 1. There will be a lot of lookahead warning that are newly produced or hidden before (JavaCC does not perform LA check when global LOOKAHEAD is larger than 1[1]), and many test cases will turn to fail. I could run the benchmark because the SQL generated by the benchmark is not touching the broken part of the parser. For solving the problem deeply I have done some tentative work and opened a JIRA issue[2] with a PR. And I didn't change the global LOOKAHEAD for Babel in the PR. This is because Babel turns a lot of reserved keyword to non-reserved[3], if we set the LOOKAHEAD to 1 JavaCC will generate more choice conflict than the default parser for Babel[4]. To deal with this we must add extra lookahead hints into default parser's code, and the default parser will lose performance by the hints. I would suggest to take the concept that "Babel is powerful but slower" and not optimize Babel. But if there is a way we can easily solve the conflicts without touching default parser I would be a fan of optimizing Babel too. What do you think? Best, Hongze [1] https://javacc.org/javaccgrm [2] https://issues.apache.org/jira/browse/CALCITE-2847 [3] https://github.com/apache/calcite/blob/2102f1f5442fa271c258b7754da8df07d65847ec/babel/src/main/codegen/config.fmpp#L338 [4] https://www.dropbox.com/s/wpap95dko4wnlcc/babel_warnings.log?dl=0 At 2019-02-13 04:11:33, "Julian Hyde" <[email protected]> wrote: >Does the parser produce correct results if you set LOOKAHEAD to 1? If so, we >should use that. > >Do the extension parsers (e.g. Babel) also work with that setting? > >We thought we were using as little lookahead as we could get away with, but >maybe we were wrong. > >> On Feb 12, 2019, at 8:24 AM, Hongze Zhang <[email protected]> wrote: >> >> Hi all, >> >> >> Recently I have spent some time on playing with Calcite's built-in SQL >> parsers. And now I'm interested with the reason why the global LOOKAHEAD is >> set to 2[1] by default. >> I run a comparative benchmark using the ParserBenchmark util class, and the >> output log shows visible parsing performance improvement after setting >> global LOOKAHEAD to 1. For example, the metric "ParserBenchmark.parseCached" >> reduced from 1693.821 us/op ± 1921.925 us/op[2] to 655.452 ± 181.100 >> us/op[3]. >> >> >> JavaCC always generates java methods like jj_2_**(...) for LL(k) (k > 1) >> grammar[4], I am almost sure this way is not efficient comparing with the >> generated code for LL(1) grammar. It might be great If we can somehow take >> the advantage of JavaCC's LL(1). >> And of course I could see there was some trade-off consideration about not >> using LL(1) by default. Then I have done some search on dev list and JIRA >> cases but found nothing. Does anyone hold information about that? >> >> >> Best, >> Hongze >> >> >> [1] >> https://github.com/apache/calcite/blob/883666929478aabe07ee5b9e572c43a6f1a703e2/core/pom.xml#L304 >> [2] https://www.dropbox.com/s/il6nodc44dzo0rz/bench_la2.log?dl=0 >> [3] https://www.dropbox.com/s/4rrou71siskdhhm/bench_la1.log?dl=0 >> [4] >> http://www.cs.tau.ac.il/~msagiv/courses/lab/Shai2/tools/javacc/examples/JavaGrammars/OPTIMIZING >>
