Re: Global LOOKAHEAD of the SQL parser

Hongze Zhang Thu, 14 Feb 2019 20:09:01 -0800

> Is there a way to have that tester check error messages if the test is 
> defined in BabelParserTest? (Perhaps you could inspect the stack and see 
> whether “BabelParserTest.test” appears in the top few entries.)


Thanks Julian! And this makes sense. I'll give it a try in the later 
modifications of the PR.

Best,
Hongze


Hongze
 
From: Julian Hyde
Date: 2019-02-15 04:32
To: dev
Subject: Re: Global LOOKAHEAD of the SQL parser
Thanks for the clarification. You have convinced me that modifying the parser 
will not be too onerous. I was pleased to see that only one change was required 
to the “server” parser, for instance.
 
I saw that you overrode the tester in BabelParserTest. I makes sense, but it 
creates an exposure: tests that are defined in BabelParserTest (as opposed to 
inherited from SqlParserTest) will never have their error message checked. Is 
there a way to have that tester check error messages if the test is defined in 
BabelParserTest? (Perhaps you could inspect the stack and see whether 
“BabelParserTest.test” appears in the top few entries.)
 
Julian
 
> On Feb 14, 2019, at 12:36 AM, Hongze Zhang <[email protected]> wrote:
> 
> I think it's not something unacceptable to require parser developers to know 
> the usage of LOOKAHEAD hints of JavaCC, for keeping the parser efficient and 
> ambiguous free. 
> Firstly, it's not really that difficult to learn. By setting the global 
> LOOKAHEAD to 1, JavaCC will perform ambiguity checking automatically during 
> the maven build lifecycle. If someone changed the parser all he needs to do 
> is to add enough LOOKAHEAD hints following the warning message, it is not 
> difficult for a developer, I think.
> And there are really not that many LOOKAHEAD hints need to be added. In 
> current version of Calcite, there are already 60 LOOKAHEAD hints in 
> Parser.jj[1], then I have added 37 new ones[2] in the PR to make things work. 
> also, Parser.jj has about 6800 lines of code now. Which means, averagely we 
> need one new LOOKAHEAD per 70 new lines, this is not much.
> Also, the smaller LOOKAHEAD is, more readable the ParseException could be. 
> You can see some "fixme"s in SqlParserTest about ParseException's error 
> message get healed by the change[3].
> 
> Best,
> Hongze
> 
> 
> [1] 
> https://github.com/apache/calcite/blob/master/core/src/main/codegen/templates/Parser.jj
> [2] 
> https://github.com/apache/calcite/blob/656608339f901fd5a6b919ee79a37ba25c16b0d2/core/src/main/codegen/templates/Parser.jj
> [3] 
> https://github.com/apache/calcite/pull/1041/files#diff-ef277e7d229cfc747255eb61ce6fd46dL1386
> 
> 
> 
> Hongze
> 
> From: Julian Hyde
> Date: 2019-02-14 04:12
> To: dev
> Subject: Re: Global LOOKAHEAD of the SQL parser
> The performance improvements are impressive - basically the parser got 10x 
> faster.
> 
> However, the parser is now more difficult to develop. Is this worth the 
> speedup?
> 
> Julian
> 
> 
>> On Feb 13, 2019, at 5:30 AM, Hongze Zhang <[email protected]> wrote:
>> 
>> Thank you very much for the quick response, Julian!
>> 
>> 
>> The parse will be broken if we simply set LOOKAHEAD to 1. There will be a 
>> lot of lookahead warning that are newly produced or hidden before (JavaCC 
>> does not perform LA check when global LOOKAHEAD is larger than 1[1]), and 
>> many test cases will turn to fail. I could run the benchmark because the SQL 
>> generated by the benchmark is not touching the broken part of the parser. 
>> For solving the problem deeply I have done some tentative work and opened a 
>> JIRA issue[2] with a PR.
>> And I didn't change the global LOOKAHEAD for Babel in the PR. This is 
>> because Babel turns a lot of reserved keyword to non-reserved[3], if we set 
>> the LOOKAHEAD to 1 JavaCC will generate more choice conflict than the 
>> default parser for Babel[4]. To deal with this we must add extra lookahead 
>> hints into default parser's code, and the default parser will lose 
>> performance by the hints. I would suggest to take the concept that "Babel is 
>> powerful but slower" and not optimize Babel. But if there is a way we can 
>> easily solve the conflicts without touching default parser I would be a fan 
>> of optimizing Babel too. What do you think?
>> 
>> 
>> Best,
>> Hongze
>> 
>> 
>> 
>> 
>> [1] https://javacc.org/javaccgrm
>> [2] https://issues.apache.org/jira/browse/CALCITE-2847
>> [3] 
>> https://github.com/apache/calcite/blob/2102f1f5442fa271c258b7754da8df07d65847ec/babel/src/main/codegen/config.fmpp#L338
>> [4] https://www.dropbox.com/s/wpap95dko4wnlcc/babel_warnings.log?dl=0
>> At 2019-02-13 04:11:33, "Julian Hyde" <[email protected]> wrote:
>>> Does the parser produce correct results if you set LOOKAHEAD to 1? If so, 
>>> we should use that.
>>> 
>>> Do the extension parsers (e.g. Babel) also work with that setting?
>>> 
>>> We thought we were using as little lookahead as we could get away with, but 
>>> maybe we were wrong.
>>> 
>>>> On Feb 12, 2019, at 8:24 AM, Hongze Zhang <[email protected]> wrote:
>>>> 
>>>> Hi all,
>>>> 
>>>> 
>>>> Recently I have spent some time on playing with Calcite's built-in SQL 
>>>> parsers. And now I'm interested with the reason why the global LOOKAHEAD 
>>>> is set to 2[1] by default.
>>>> I run a comparative benchmark using the ParserBenchmark util class, and 
>>>> the output log shows visible parsing performance improvement after setting 
>>>> global LOOKAHEAD to 1. For example, the metric 
>>>> "ParserBenchmark.parseCached" reduced from 1693.821 us/op ± 1921.925 
>>>> us/op[2] to 655.452 ± 181.100 us/op[3].
>>>> 
>>>> 
>>>> JavaCC always generates java methods like jj_2_**(...) for LL(k) (k > 1) 
>>>> grammar[4], I am almost sure this way is not efficient comparing with the 
>>>> generated code for LL(1) grammar. It might be great If we can somehow take 
>>>> the advantage of JavaCC's LL(1). 
>>>> And of course I could see there was some trade-off consideration about not 
>>>> using LL(1) by default. Then I have done some search on dev list and JIRA 
>>>> cases but found nothing. Does anyone hold information about that?
>>>> 
>>>> 
>>>> Best,
>>>> Hongze
>>>> 
>>>> 
>>>> [1] 
>>>> https://github.com/apache/calcite/blob/883666929478aabe07ee5b9e572c43a6f1a703e2/core/pom.xml#L304
>>>> [2] https://www.dropbox.com/s/il6nodc44dzo0rz/bench_la2.log?dl=0
>>>> [3] https://www.dropbox.com/s/4rrou71siskdhhm/bench_la1.log?dl=0
>>>> [4] 
>>>> http://www.cs.tau.ac.il/~msagiv/courses/lab/Shai2/tools/javacc/examples/JavaGrammars/OPTIMIZING
>>>>

Re: Global LOOKAHEAD of the SQL parser

Reply via email to