The performance improvements are impressive - basically the parser got 10x 
faster.

However, the parser is now more difficult to develop. Is this worth the speedup?

Julian


> On Feb 13, 2019, at 5:30 AM, Hongze Zhang <[email protected]> wrote:
> 
> Thank you very much for the quick response, Julian!
> 
> 
> The parse will be broken if we simply set LOOKAHEAD to 1. There will be a lot 
> of lookahead warning that are newly produced or hidden before (JavaCC does 
> not perform LA check when global LOOKAHEAD is larger than 1[1]), and many 
> test cases will turn to fail. I could run the benchmark because the SQL 
> generated by the benchmark is not touching the broken part of the parser. 
> For solving the problem deeply I have done some tentative work and opened a 
> JIRA issue[2] with a PR.
> And I didn't change the global LOOKAHEAD for Babel in the PR. This is because 
> Babel turns a lot of reserved keyword to non-reserved[3], if we set the 
> LOOKAHEAD to 1 JavaCC will generate more choice conflict than the default 
> parser for Babel[4]. To deal with this we must add extra lookahead hints into 
> default parser's code, and the default parser will lose performance by the 
> hints. I would suggest to take the concept that "Babel is powerful but 
> slower" and not optimize Babel. But if there is a way we can easily solve the 
> conflicts without touching default parser I would be a fan of optimizing 
> Babel too. What do you think?
> 
> 
> Best,
> Hongze
> 
> 
> 
> 
> [1] https://javacc.org/javaccgrm
> [2] https://issues.apache.org/jira/browse/CALCITE-2847
> [3] 
> https://github.com/apache/calcite/blob/2102f1f5442fa271c258b7754da8df07d65847ec/babel/src/main/codegen/config.fmpp#L338
> [4] https://www.dropbox.com/s/wpap95dko4wnlcc/babel_warnings.log?dl=0
> At 2019-02-13 04:11:33, "Julian Hyde" <[email protected]> wrote:
>> Does the parser produce correct results if you set LOOKAHEAD to 1? If so, we 
>> should use that.
>> 
>> Do the extension parsers (e.g. Babel) also work with that setting?
>> 
>> We thought we were using as little lookahead as we could get away with, but 
>> maybe we were wrong.
>> 
>>> On Feb 12, 2019, at 8:24 AM, Hongze Zhang <[email protected]> wrote:
>>> 
>>> Hi all,
>>> 
>>> 
>>> Recently I have spent some time on playing with Calcite's built-in SQL 
>>> parsers. And now I'm interested with the reason why the global LOOKAHEAD is 
>>> set to 2[1] by default.
>>> I run a comparative benchmark using the ParserBenchmark util class, and the 
>>> output log shows visible parsing performance improvement after setting 
>>> global LOOKAHEAD to 1. For example, the metric 
>>> "ParserBenchmark.parseCached" reduced from 1693.821 us/op ± 1921.925 
>>> us/op[2] to 655.452 ± 181.100 us/op[3].
>>> 
>>> 
>>> JavaCC always generates java methods like jj_2_**(...) for LL(k) (k > 1) 
>>> grammar[4], I am almost sure this way is not efficient comparing with the 
>>> generated code for LL(1) grammar. It might be great If we can somehow take 
>>> the advantage of JavaCC's LL(1). 
>>> And of course I could see there was some trade-off consideration about not 
>>> using LL(1) by default. Then I have done some search on dev list and JIRA 
>>> cases but found nothing. Does anyone hold information about that?
>>> 
>>> 
>>> Best,
>>> Hongze
>>> 
>>> 
>>> [1] 
>>> https://github.com/apache/calcite/blob/883666929478aabe07ee5b9e572c43a6f1a703e2/core/pom.xml#L304
>>> [2] https://www.dropbox.com/s/il6nodc44dzo0rz/bench_la2.log?dl=0
>>> [3] https://www.dropbox.com/s/4rrou71siskdhhm/bench_la1.log?dl=0
>>> [4] 
>>> http://www.cs.tau.ac.il/~msagiv/courses/lab/Shai2/tools/javacc/examples/JavaGrammars/OPTIMIZING
>>> 

Reply via email to