[
https://issues.apache.org/jira/browse/CALCITE-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julian Hyde updated CALCITE-5208:
---------------------------------
Description:
We could reduce the size of generated parser by optimizing lookahead.
The main problem is identifiers and non-reserved keywords. Every point in the
grammar where the parser needs to choose between an identifier and something
else the parser needs to do lookahead, so it generates a {{switch}} expression.
But that {{switch}} contains not just the identifier tokens ({{IDENTIFIER}},
{{UNICODE_QUOTED_IDENTIFIER}} and 5 others) but about 300 non-reserved keywords.
Switches like this, which are typically 300 to 460 lines long, occur 53 times
in the current parser. That's 15,000 - 21,000 lines of code, and the entire
parser is 38,000 lines of code. And there are several parsers (core, babel,
server, test).
{noformat}
$ find . -name SqlParserImpl.java
./core/build/javacc/javaCCMain/org/apache/calcite/sql/parser/impl/SqlParserImpl.java
$ wc
core/build/javacc/javaCCMain/org/apache/calcite/sql/parser/impl/SqlParserImpl.java
38,242 86,680 989,569
core/build/javacc/javaCCMain/org/apache/calcite/sql/parser/impl/SqlParserImpl.java
$ grep 'case ZONE:'
core/build/javacc/javaCCMain/org/apache/calcite/sql/parser/impl/SqlParserImpl.java
|wc
53 106 865
$ grep 'case [A-Z_]*:'
core/build/javacc/javaCCMain/org/apache/calcite/sql/parser/impl/SqlParserImpl.java
|wc
21,606 43,434 479,472
{noformat}
We can optimize by putting common patterns into shared methods. For example,
this pattern occurs several times, and could be a method {{AliasOpt()}}:
{code}
( [ <AS> ] alias = SimpleIdentifier() | { alias = null; } )
{code}
It might be possible to remove a choice. Rather than choosing whether or not to
fire a rule, we could fire the rule, call the method, and have the method
return null.
I don't know how many of the 53 switches we can remove. If we can remove just
10 that would be a significant number of lines of code saved.
was:
We could reduce the size of generated parser by optimizing lookahead.
The main problem is identifiers and non-reserved keywords. Every point in the
grammar where the parser needs to choose between an identifier and something
else the parser needs to do lookahead, so it generates a {{switch}} expression.
But that {{switch}} contains not just the identifier tokens ({{IDENTIFIER}},
{{UNICODE_QUOTED_IDENTIFIER}} and 5 others) but about 300 non-reserved keywords.
Switches like this, which are typically 300 to 460 lines long, occur 53 times
in the current parser. That's 24,000 lines of code, and the entire parser is
38,000 lines of code. And there are several parsers (core, babel, server, test).
{noformat}
$ find . -name SqlParserImpl.java
./core/build/javacc/javaCCMain/org/apache/calcite/sql/parser/impl/SqlParserImpl.java
$ wc
core/build/javacc/javaCCMain/org/apache/calcite/sql/parser/impl/SqlParserImpl.java
38242 86680 989569
core/build/javacc/javaCCMain/org/apache/calcite/sql/parser/impl/SqlParserImpl.java
$ grep 'case ZONE:'
core/build/javacc/javaCCMain/org/apache/calcite/sql/parser/impl/SqlParserImpl.java
|wc
53 106 865
$ grep 'case [A-Z]*:'
core/build/javacc/javaCCMain/org/apache/calcite/sql/parser/impl/SqlParserImpl.java
|wc
13507 27014 255827
{noformat}
We can optimize by putting common patterns into shared methods. For example,
this pattern occurs several times, and could be a method {{AliasOpt()}}:
{code}
( [ <AS> ] alias = SimpleIdentifier() | { alias = null; } )
{code}
It might be possible to remove a choice. Rather than choosing whether or not to
fire a rule, we could fire the rule, call the method, and have the method
return null.
I don't know how many of the 53 switches we can remove. If we can remove just
10 that would be a significant number of lines of code saved.
> Reduce size of generated parser by optimizing lookahead
> -------------------------------------------------------
>
> Key: CALCITE-5208
> URL: https://issues.apache.org/jira/browse/CALCITE-5208
> Project: Calcite
> Issue Type: Bug
> Reporter: Julian Hyde
> Priority: Major
>
> We could reduce the size of generated parser by optimizing lookahead.
> The main problem is identifiers and non-reserved keywords. Every point in the
> grammar where the parser needs to choose between an identifier and something
> else the parser needs to do lookahead, so it generates a {{switch}}
> expression. But that {{switch}} contains not just the identifier tokens
> ({{IDENTIFIER}}, {{UNICODE_QUOTED_IDENTIFIER}} and 5 others) but about 300
> non-reserved keywords.
> Switches like this, which are typically 300 to 460 lines long, occur 53 times
> in the current parser. That's 15,000 - 21,000 lines of code, and the entire
> parser is 38,000 lines of code. And there are several parsers (core, babel,
> server, test).
> {noformat}
> $ find . -name SqlParserImpl.java
> ./core/build/javacc/javaCCMain/org/apache/calcite/sql/parser/impl/SqlParserImpl.java
> $ wc
> core/build/javacc/javaCCMain/org/apache/calcite/sql/parser/impl/SqlParserImpl.java
> 38,242 86,680 989,569
> core/build/javacc/javaCCMain/org/apache/calcite/sql/parser/impl/SqlParserImpl.java
> $ grep 'case ZONE:'
> core/build/javacc/javaCCMain/org/apache/calcite/sql/parser/impl/SqlParserImpl.java
> |wc
> 53 106 865
> $ grep 'case [A-Z_]*:'
> core/build/javacc/javaCCMain/org/apache/calcite/sql/parser/impl/SqlParserImpl.java
> |wc
> 21,606 43,434 479,472
> {noformat}
> We can optimize by putting common patterns into shared methods. For example,
> this pattern occurs several times, and could be a method {{AliasOpt()}}:
> {code}
> ( [ <AS> ] alias = SimpleIdentifier() | { alias = null; } )
> {code}
> It might be possible to remove a choice. Rather than choosing whether or not
> to fire a rule, we could fire the rule, call the method, and have the method
> return null.
> I don't know how many of the 53 switches we can remove. If we can remove just
> 10 that would be a significant number of lines of code saved.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)