Re: Global LOOKAHEAD of the SQL parser

Hongze Zhang Wed, 13 Feb 2019 05:30:34 -0800

Thank you very much for the quick response, Julian!

The parse will be broken if we simply set LOOKAHEAD to 1. There will be a lot 
of lookahead warning that are newly produced or hidden before (JavaCC does not 
perform LA check when global LOOKAHEAD is larger than 1[1]), and many test 
cases will turn to fail. I could run the benchmark because the SQL generated by 
the benchmark is not touching the broken part of the parser. 
For solving the problem deeply I have done some tentative work and opened a 
JIRA issue[2] with a PR.
And I didn't change the global LOOKAHEAD for Babel in the PR. This is because 
Babel turns a lot of reserved keyword to non-reserved[3], if we set the 
LOOKAHEAD to 1 JavaCC will generate more choice conflict than the default 
parser for Babel[4]. To deal with this we must add extra lookahead hints into 
default parser's code, and the default parser will lose performance by the 
hints. I would suggest to take the concept that "Babel is powerful but slower" 
and not optimize Babel. But if there is a way we can easily solve the conflicts 
without touching default parser I would be a fan of optimizing Babel too. What 
do you think?

Best,
Hongze

[1] https://javacc.org/javaccgrm
[2] https://issues.apache.org/jira/browse/CALCITE-2847
[3] 
https://github.com/apache/calcite/blob/2102f1f5442fa271c258b7754da8df07d65847ec/babel/src/main/codegen/config.fmpp#L338
[4] https://www.dropbox.com/s/wpap95dko4wnlcc/babel_warnings.log?dl=0
At 2019-02-13 04:11:33, "Julian Hyde" <[email protected]> wrote:
>Does the parser produce correct results if you set LOOKAHEAD to 1? If so, we 
>should use that.
>
>Do the extension parsers (e.g. Babel) also work with that setting?
>
>We thought we were using as little lookahead as we could get away with, but 
>maybe we were wrong.
>
>> On Feb 12, 2019, at 8:24 AM, Hongze Zhang <[email protected]> wrote:
>> 
>> Hi all,
>> 
>> 
>> Recently I have spent some time on playing with Calcite's built-in SQL 
>> parsers. And now I'm interested with the reason why the global LOOKAHEAD is 
>> set to 2[1] by default.
>> I run a comparative benchmark using the ParserBenchmark util class, and the 
>> output log shows visible parsing performance improvement after setting 
>> global LOOKAHEAD to 1. For example, the metric "ParserBenchmark.parseCached" 
>> reduced from 1693.821 us/op ± 1921.925 us/op[2] to 655.452 ± 181.100 
>> us/op[3].
>> 
>> 
>> JavaCC always generates java methods like jj_2_**(...) for LL(k) (k > 1) 
>> grammar[4], I am almost sure this way is not efficient comparing with the 
>> generated code for LL(1) grammar. It might be great If we can somehow take 
>> the advantage of JavaCC's LL(1). 
>> And of course I could see there was some trade-off consideration about not 
>> using LL(1) by default. Then I have done some search on dev list and JIRA 
>> cases but found nothing. Does anyone hold information about that?
>> 
>> 
>> Best,
>> Hongze
>> 
>> 
>> [1] 
>> https://github.com/apache/calcite/blob/883666929478aabe07ee5b9e572c43a6f1a703e2/core/pom.xml#L304
>> [2] https://www.dropbox.com/s/il6nodc44dzo0rz/bench_la2.log?dl=0
>> [3] https://www.dropbox.com/s/4rrou71siskdhhm/bench_la1.log?dl=0
>> [4] 
>> http://www.cs.tau.ac.il/~msagiv/courses/lab/Shai2/tools/javacc/examples/JavaGrammars/OPTIMIZING
>>

Re: Global LOOKAHEAD of the SQL parser

Reply via email to