This seems fine.  I don't think the lexer changes enough for us to have
issues with new cases.  I think adding some comments to explain why we
are doing it is enough, and perhaps a test case that can be reproduced
later for testing.

---------------------------------------------------------------------------

Tom Lane wrote:
> I was doing some profiles recently that showed that on simple statements
> (like INSERT with a few column values) the basic parser (flex/bison) was
> taking up a noticeable percentage of the total CPU time.  We already
> have done some things to speed up the lexer, like use -CF option for
> large-but-fast tables.  I read in the flex manual about how you can make
> it faster yet if you can avoid the need for backup states.  It's not
> terribly well documented, but I eventually worked out how to do it, and
> find that it seems to make the lexer about a third faster.  Basically
> what you have to do is provide alternate rules that allow the lexer to
> match *some* rule on any prefix of a complete token.  For instance the
> rule for {real} implies that given
>       123.456E+
> if the next character is not a digit, then the {real} rule fails, and
> the lexer has to backtrack to
>       123.456
> which it can recognize as a {decimal}.  You can add a rule that does
> match in this situation, throws back the two unwanted characters with
> yyless(), and returns the number as the {decimal} rule would do.
> The next call will carry on lexing the "E" as an identifier, etc.
> (Note: this is going to end up being a syntax error because the SQL
> grammar never allows a number to immediately precede an identifier;
> but that's not the lexer's concern, and so I don't think that we should
> make the extra rule throw an error immediately.)
> 
> What I'm wondering is whether this is really worth doing or not.
> There are currently just two parts of the lexer rules that are affected
> --- the {real} rule illustrated above, and the rules that allow quoted
> strings to be split across lines as the SQL spec requires.  But the
> patches are still pretty ugly, and what's really annoying is that there
> doesn't seem to be any way to get flex to complain if someone later
> makes a change that breaks the no-backup-cases property again.
> 
> A prototype patch (completely undocumented :-() is attached.  Any
> thoughts about whether it's worth pressing forward with?
> 
>                       regards, tom lane
> 
> *** src/backend/parser/scan.l.orig    Fri Mar 11 21:15:20 2005
> --- src/backend/parser/scan.l Mon May 23 01:05:54 2005
> ***************
> *** 148,163 ****
>    * validate the contents.
>    */
>   xbstart                     [bB]{quote}
> ! xbstop                      {quote}
>   xbinside            [^']*
>   xbcat                       {quote}{whitespace_with_newline}{quote}
>   
>   /* Hexadecimal number
>    */
>   xhstart                     [xX]{quote}
> ! xhstop                      {quote}
>   xhinside            [^']*
>   xhcat                       {quote}{whitespace_with_newline}{quote}
>   
>   /* National character
>    */
> --- 148,165 ----
>    * validate the contents.
>    */
>   xbstart                     [bB]{quote}
> ! xbstop                      {quote}{whitespace}*
>   xbinside            [^']*
>   xbcat                       {quote}{whitespace_with_newline}{quote}
> + xbconfused          {quote}{whitespace}*"-"
>   
>   /* Hexadecimal number
>    */
>   xhstart                     [xX]{quote}
> ! xhstop                      {quote}{whitespace}*
>   xhinside            [^']*
>   xhcat                       {quote}{whitespace_with_newline}{quote}
> + xhconfused          {quote}{whitespace}*"-"
>   
>   /* National character
>    */
> ***************
> *** 169,180 ****
>    */
>   quote                       '
>   xqstart                     {quote}
> ! xqstop                      {quote}
>   xqdouble            {quote}{quote}
>   xqinside            [^\\']+
>   xqescape            [\\][^0-7]
>   xqoctesc            [\\][0-7]{1,3}
>   xqcat                       {quote}{whitespace_with_newline}{quote}
>   
>   /* $foo$ style quotes ("dollar quoting")
>    * The quoted string starts with $foo$ where "foo" is an optional string
> --- 171,183 ----
>    */
>   quote                       '
>   xqstart                     {quote}
> ! xqstop                      {quote}{whitespace}*
>   xqdouble            {quote}{quote}
>   xqinside            [^\\']+
>   xqescape            [\\][^0-7]
>   xqoctesc            [\\][0-7]{1,3}
>   xqcat                       {quote}{whitespace_with_newline}{quote}
> + xqconfused          {quote}{whitespace}*"-"
>   
>   /* $foo$ style quotes ("dollar quoting")
>    * The quoted string starts with $foo$ where "foo" is an optional string
> ***************
> *** 185,190 ****
> --- 188,194 ----
>   dolq_start          [A-Za-z\200-\377_]
>   dolq_cont           [A-Za-z\200-\377_0-9]
>   dolqdelim           \$({dolq_start}{dolq_cont}*)?\$
> + dolqfailed          \${dolq_start}{dolq_cont}*
>   dolqinside          [^$]+
>   
>   /* Double quote
> ***************
> *** 247,253 ****
>   
>   integer                     {digit}+
>   decimal                     (({digit}*\.{digit}+)|({digit}+\.{digit}*))
> ! real                        
> ((({digit}*\.{digit}+)|({digit}+\.{digit}*)|({digit}+))([Ee][-+]?{digit}+))
>   
>   param                       \${integer}
>   
> --- 251,259 ----
>   
>   integer                     {digit}+
>   decimal                     (({digit}*\.{digit}+)|({digit}+\.{digit}*))
> ! real                        ({integer}|{decimal})[Ee][-+]?{digit}+
> ! realfail1           ({integer}|{decimal})[Ee]
> ! realfail2           ({integer}|{decimal})[Ee][-+]
>   
>   param                       \${integer}
>   
> ***************
> *** 310,315 ****
> --- 316,325 ----
>                                       /* ignore */
>                               }
>   
> + <xc>\*+                     {
> +                                     /* ignore */
> +                             }
> + 
>   <xc><<EOF>>         { yyerror("unterminated /* comment"); }
>   
>   {xbstart}           {
> ***************
> *** 329,334 ****
> --- 339,350 ----
>                                       yylval.str = litbufdup();
>                                       return BCONST;
>                               }
> + <xb>{xbconfused} {
> +                                     yyless(yyleng-1);
> +                                     BEGIN(INITIAL);
> +                                     yylval.str = litbufdup();
> +                                     return BCONST;
> +                             }
>   <xh>{xhinside}      |
>   <xb>{xbinside}      {
>                                       addlit(yytext, yyleng);
> ***************
> *** 356,361 ****
> --- 372,383 ----
>                                       yylval.str = litbufdup();
>                                       return XCONST;
>                               }
> + <xh>{xhconfused} {
> +                                     yyless(yyleng-1);
> +                                     BEGIN(INITIAL);
> +                                     yylval.str = litbufdup();
> +                                     return XCONST;
> +                             }
>   <xh><<EOF>>         { yyerror("unterminated hexadecimal string literal"); }
>   
>   {xnstart}           {
> ***************
> *** 385,390 ****
> --- 407,418 ----
>                                       yylval.str = litbufdup();
>                                       return SCONST;
>                               }
> + <xq>{xqconfused} {
> +                                     yyless(yyleng-1);
> +                                     BEGIN(INITIAL);
> +                                     yylval.str = litbufdup();
> +                                     return SCONST;
> +                             }
>   <xq>{xqdouble}  {
>                                       addlitchar('\'');
>                               }
> ***************
> *** 413,418 ****
> --- 441,447 ----
>                                       BEGIN(xdolq);
>                                       startlit();
>                               }
> + {dolqfailed}        { yyerror("invalid token"); }
>   <xdolq>{dolqdelim} {
>                                       if (strcmp(yytext, dolqstart) == 0)
>                                       {
> ***************
> *** 435,440 ****
> --- 464,472 ----
>   <xdolq>{dolqinside} {
>                                       addlit(yytext, yyleng);
>                               }
> + <xdolq>{dolqfailed} {
> +                                     addlit(yytext, yyleng);
> +                             }
>   <xdolq>.            {
>                                       /* This is only needed for $ inside the 
> quoted text */
>                                       addlitchar(yytext[0]);
> ***************
> *** 576,582 ****
>                                       yylval.str = pstrdup(yytext);
>                                       return FCONST;
>                               }
> ! 
>   
>   {identifier}        {
>                                       const ScanKeyword *keyword;
> --- 608,623 ----
>                                       yylval.str = pstrdup(yytext);
>                                       return FCONST;
>                               }
> ! {realfail1}         {
> !                                     yyless(yyleng-1);
> !                                     yylval.str = pstrdup(yytext);
> !                                     return FCONST;
> !                             }
> ! {realfail2}         {
> !                                     yyless(yyleng-2);
> !                                     yylval.str = pstrdup(yytext);
> !                                     return FCONST;
> !                             }
>   
>   {identifier}        {
>                                       const ScanKeyword *keyword;
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
> 
>                http://archives.postgresql.org
> 

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Reply via email to