The performance improvements are impressive - basically the parser got 10x faster.
However, the parser is now more difficult to develop. Is this worth the speedup? Julian > On Feb 13, 2019, at 5:30 AM, Hongze Zhang <[email protected]> wrote: > > Thank you very much for the quick response, Julian! > > > The parse will be broken if we simply set LOOKAHEAD to 1. There will be a lot > of lookahead warning that are newly produced or hidden before (JavaCC does > not perform LA check when global LOOKAHEAD is larger than 1[1]), and many > test cases will turn to fail. I could run the benchmark because the SQL > generated by the benchmark is not touching the broken part of the parser. > For solving the problem deeply I have done some tentative work and opened a > JIRA issue[2] with a PR. > And I didn't change the global LOOKAHEAD for Babel in the PR. This is because > Babel turns a lot of reserved keyword to non-reserved[3], if we set the > LOOKAHEAD to 1 JavaCC will generate more choice conflict than the default > parser for Babel[4]. To deal with this we must add extra lookahead hints into > default parser's code, and the default parser will lose performance by the > hints. I would suggest to take the concept that "Babel is powerful but > slower" and not optimize Babel. But if there is a way we can easily solve the > conflicts without touching default parser I would be a fan of optimizing > Babel too. What do you think? > > > Best, > Hongze > > > > > [1] https://javacc.org/javaccgrm > [2] https://issues.apache.org/jira/browse/CALCITE-2847 > [3] > https://github.com/apache/calcite/blob/2102f1f5442fa271c258b7754da8df07d65847ec/babel/src/main/codegen/config.fmpp#L338 > [4] https://www.dropbox.com/s/wpap95dko4wnlcc/babel_warnings.log?dl=0 > At 2019-02-13 04:11:33, "Julian Hyde" <[email protected]> wrote: >> Does the parser produce correct results if you set LOOKAHEAD to 1? If so, we >> should use that. >> >> Do the extension parsers (e.g. Babel) also work with that setting? >> >> We thought we were using as little lookahead as we could get away with, but >> maybe we were wrong. >> >>> On Feb 12, 2019, at 8:24 AM, Hongze Zhang <[email protected]> wrote: >>> >>> Hi all, >>> >>> >>> Recently I have spent some time on playing with Calcite's built-in SQL >>> parsers. And now I'm interested with the reason why the global LOOKAHEAD is >>> set to 2[1] by default. >>> I run a comparative benchmark using the ParserBenchmark util class, and the >>> output log shows visible parsing performance improvement after setting >>> global LOOKAHEAD to 1. For example, the metric >>> "ParserBenchmark.parseCached" reduced from 1693.821 us/op ± 1921.925 >>> us/op[2] to 655.452 ± 181.100 us/op[3]. >>> >>> >>> JavaCC always generates java methods like jj_2_**(...) for LL(k) (k > 1) >>> grammar[4], I am almost sure this way is not efficient comparing with the >>> generated code for LL(1) grammar. It might be great If we can somehow take >>> the advantage of JavaCC's LL(1). >>> And of course I could see there was some trade-off consideration about not >>> using LL(1) by default. Then I have done some search on dev list and JIRA >>> cases but found nothing. Does anyone hold information about that? >>> >>> >>> Best, >>> Hongze >>> >>> >>> [1] >>> https://github.com/apache/calcite/blob/883666929478aabe07ee5b9e572c43a6f1a703e2/core/pom.xml#L304 >>> [2] https://www.dropbox.com/s/il6nodc44dzo0rz/bench_la2.log?dl=0 >>> [3] https://www.dropbox.com/s/4rrou71siskdhhm/bench_la1.log?dl=0 >>> [4] >>> http://www.cs.tau.ac.il/~msagiv/courses/lab/Shai2/tools/javacc/examples/JavaGrammars/OPTIMIZING >>>
