I wrote:
> So I think we are down to a choice of doing nothing for 8.4, or teaching
> the existing plpgsql lexer about standard_conforming_strings.  Assuming
> the current proposal for U& literals holds up, it should not be
> necessary for plpgsql to know about those explicitly as long as it obeys
> standard_conforming_strings, so this might not be too horrid a project.
> I'll take a look at that next.

The attached proposed patch rips out plpgsql's handling of comments and
string literals, and puts in scanner rules that are extracted from the
core lexer (but simplified in a few places where we don't need all the
complexity).  The net user-visible effects should be:

* Both regular and E'' literals should now be parsed exactly the same
as the core does it.

* Nested slash-star comments are now handled properly.

* Warnings and errors associated with string parsing should now match
the core, which means they might vary a bit from previous plpgsql
behavior.

I need to test this a bit more, and it could probably do with adding
a few regression test cases, but I think it's code-complete.

Comments?

                        regards, tom lane

PS: in passing I got rid of the scanner_functype/scanner_typereported
kluge, which might once have had some purpose but now is just cluttering
both the scanner and the grammar.  This is a leftover from my failed
attempt at removing the scanner altogether.  Since it simplifies the
code I thought I'd keep it.

Index: src/pl/plpgsql/src/gram.y
===================================================================
RCS file: /cvsroot/pgsql/src/pl/plpgsql/src/gram.y,v
retrieving revision 1.121
diff -c -r1.121 gram.y
*** src/pl/plpgsql/src/gram.y   18 Feb 2009 11:33:04 -0000      1.121
--- src/pl/plpgsql/src/gram.y   19 Apr 2009 16:28:04 -0000
***************
*** 62,67 ****
--- 62,69 ----
                                                                                
   int lineno);
  static        void                     check_sql_expr(const char *stmt);
  static        void                     plpgsql_sql_error_callback(void *arg);
+ static        char                    *parse_string_token(const char *token);
+ static        void                     plpgsql_string_error_callback(void 
*arg);
  static        char                    *check_label(const char *yytxt);
  static        void                     check_labels(const char *start_label,
                                                                          const 
char *end_label);
***************
*** 228,235 ****
                /*
                 * Other tokens
                 */
- %token        T_FUNCTION
- %token        T_TRIGGER
  %token        T_STRING
  %token        T_NUMBER
  %token        T_SCALAR                                /* a VAR, RECFIELD, or 
TRIGARG */
--- 230,235 ----
***************
*** 244,256 ****
  
  %%
  
! pl_function           : T_FUNCTION comp_optsect pl_block opt_semi
                                        {
!                                               yylval.program = 
(PLpgSQL_stmt_block *)$3;
!                                       }
!                               | T_TRIGGER comp_optsect pl_block opt_semi
!                                       {
!                                               yylval.program = 
(PLpgSQL_stmt_block *)$3;
                                        }
                                ;
  
--- 244,252 ----
  
  %%
  
! pl_function           : comp_optsect pl_block opt_semi
                                        {
!                                               yylval.program = 
(PLpgSQL_stmt_block *) $2;
                                        }
                                ;
  
***************
*** 1403,1409 ****
                                                        if (tok == T_STRING)
                                                        {
                                                                /* old style 
message and parameters */
!                                                               new->message = 
plpgsql_get_string_value();
                                                                /*
                                                                 * We expect 
either a semi-colon, which
                                                                 * indicates no 
parameters, or a comma that
--- 1399,1405 ----
                                                        if (tok == T_STRING)
                                                        {
                                                                /* old style 
message and parameters */
!                                                               new->message = 
parse_string_token(yytext);
                                                                /*
                                                                 * We expect 
either a semi-colon, which
                                                                 * indicates no 
parameters, or a comma that
***************
*** 1435,1441 ****
  
                                                                        if 
(yylex() != T_STRING)
                                                                                
yyerror("syntax error");
!                                                                       
sqlstatestr = plpgsql_get_string_value();
  
                                                                        if 
(strlen(sqlstatestr) != 5)
                                                                                
yyerror("invalid SQLSTATE code");
--- 1431,1437 ----
  
                                                                        if 
(yylex() != T_STRING)
                                                                                
yyerror("syntax error");
!                                                                       
sqlstatestr = parse_string_token(yytext);
  
                                                                        if 
(strlen(sqlstatestr) != 5)
                                                                                
yyerror("invalid SQLSTATE code");
***************
*** 1778,1784 ****
                                                        /* next token should be 
a string literal */
                                                        if (yylex() != T_STRING)
                                                                yyerror("syntax 
error");
!                                                       sqlstatestr = 
plpgsql_get_string_value();
  
                                                        if (strlen(sqlstatestr) 
!= 5)
                                                                
yyerror("invalid SQLSTATE code");
--- 1774,1780 ----
                                                        /* next token should be 
a string literal */
                                                        if (yylex() != T_STRING)
                                                                yyerror("syntax 
error");
!                                                       sqlstatestr = 
parse_string_token(yytext);
  
                                                        if (strlen(sqlstatestr) 
!= 5)
                                                                
yyerror("invalid SQLSTATE code");
***************
*** 2738,2743 ****
--- 2734,2782 ----
        errposition(0);
  }
  
+ /*
+  * Convert a string-literal token to the represented string value.
+  *
+  * To do this, we need to invoke the core lexer.  To avoid confusion between
+  * the core bison/flex definitions and our own, the actual invocation is in
+  * pl_funcs.c.  Here we are only concerned with setting up the right 
errcontext
+  * state, which is handled the same as in check_sql_expr().
+  */
+ static char *
+ parse_string_token(const char *token)
+ {
+       char       *result;
+       ErrorContextCallback  syntax_errcontext;
+       ErrorContextCallback *previous_errcontext;
+ 
+       /* See comments in check_sql_expr() */
+       Assert(error_context_stack->callback == plpgsql_compile_error_callback);
+ 
+       previous_errcontext = error_context_stack;
+       syntax_errcontext.callback = plpgsql_string_error_callback;
+       syntax_errcontext.arg = (char *) token;
+       syntax_errcontext.previous = error_context_stack->previous;
+       error_context_stack = &syntax_errcontext;
+ 
+       result = plpgsql_parse_string_token(token);
+ 
+       /* Restore former ereport callback */
+       error_context_stack = previous_errcontext;
+ 
+       return result;
+ }
+ 
+ static void
+ plpgsql_string_error_callback(void *arg)
+ {
+       Assert(plpgsql_error_funcname);
+ 
+       errcontext("String literal in PL/PgSQL function \"%s\" near line %d",
+                          plpgsql_error_funcname, plpgsql_error_lineno);
+       /* representing the string literal as internalquery seems overkill */
+       errposition(0);
+ }
+ 
  static char *
  check_label(const char *yytxt)
  {
Index: src/pl/plpgsql/src/pl_comp.c
===================================================================
RCS file: /cvsroot/pgsql/src/pl/plpgsql/src/pl_comp.c,v
retrieving revision 1.134
diff -c -r1.134 pl_comp.c
*** src/pl/plpgsql/src/pl_comp.c        18 Feb 2009 11:33:04 -0000      1.134
--- src/pl/plpgsql/src/pl_comp.c        19 Apr 2009 16:28:04 -0000
***************
*** 261,267 ****
                   bool forValidator)
  {
        Form_pg_proc procStruct = (Form_pg_proc) GETSTRUCT(procTup);
!       int                     functype = CALLED_AS_TRIGGER(fcinfo) ? 
T_TRIGGER : T_FUNCTION;
        Datum           prosrcdatum;
        bool            isnull;
        char       *proc_source;
--- 261,267 ----
                   bool forValidator)
  {
        Form_pg_proc procStruct = (Form_pg_proc) GETSTRUCT(procTup);
!       bool            is_trigger = CALLED_AS_TRIGGER(fcinfo);
        Datum           prosrcdatum;
        bool            isnull;
        char       *proc_source;
***************
*** 293,299 ****
        if (isnull)
                elog(ERROR, "null prosrc");
        proc_source = TextDatumGetCString(prosrcdatum);
!       plpgsql_scanner_init(proc_source, functype);
  
        plpgsql_error_funcname = pstrdup(NameStr(procStruct->proname));
        plpgsql_error_lineno = 0;
--- 293,299 ----
        if (isnull)
                elog(ERROR, "null prosrc");
        proc_source = TextDatumGetCString(prosrcdatum);
!       plpgsql_scanner_init(proc_source);
  
        plpgsql_error_funcname = pstrdup(NameStr(procStruct->proname));
        plpgsql_error_lineno = 0;
***************
*** 359,371 ****
        function->fn_oid = fcinfo->flinfo->fn_oid;
        function->fn_xmin = HeapTupleHeaderGetXmin(procTup->t_data);
        function->fn_tid = procTup->t_self;
!       function->fn_functype = functype;
        function->fn_cxt = func_cxt;
        function->out_param_varno = -1;         /* set up for no OUT param */
  
!       switch (functype)
        {
!               case T_FUNCTION:
  
                        /*
                         * Fetch info about the procedure's parameters. 
Allocations aren't
--- 359,371 ----
        function->fn_oid = fcinfo->flinfo->fn_oid;
        function->fn_xmin = HeapTupleHeaderGetXmin(procTup->t_data);
        function->fn_tid = procTup->t_self;
!       function->fn_is_trigger = is_trigger;
        function->fn_cxt = func_cxt;
        function->out_param_varno = -1;         /* set up for no OUT param */
  
!       switch (is_trigger)
        {
!               case false:
  
                        /*
                         * Fetch info about the procedure's parameters. 
Allocations aren't
***************
*** 564,570 ****
                        ReleaseSysCache(typeTup);
                        break;
  
!               case T_TRIGGER:
                        /* Trigger procedure's return type is unknown yet */
                        function->fn_rettype = InvalidOid;
                        function->fn_retbyval = false;
--- 564,570 ----
                        ReleaseSysCache(typeTup);
                        break;
  
!               case true:
                        /* Trigger procedure's return type is unknown yet */
                        function->fn_rettype = InvalidOid;
                        function->fn_retbyval = false;
***************
*** 645,651 ****
                        break;
  
                default:
!                       elog(ERROR, "unrecognized function typecode: %u", 
functype);
                        break;
        }
  
--- 645,651 ----
                        break;
  
                default:
!                       elog(ERROR, "unrecognized function typecode: %d", (int) 
is_trigger);
                        break;
        }
  
***************
*** 790,796 ****
         * Recognize tg_argv when compiling triggers
         * (XXX this sucks, it should be a regular variable in the namestack)
         */
!       if (plpgsql_curr_compile->fn_functype == T_TRIGGER)
        {
                if (strcmp(cp[0], "tg_argv") == 0)
                {
--- 790,796 ----
         * Recognize tg_argv when compiling triggers
         * (XXX this sucks, it should be a regular variable in the namestack)
         */
!       if (plpgsql_curr_compile->fn_is_trigger)
        {
                if (strcmp(cp[0], "tg_argv") == 0)
                {
Index: src/pl/plpgsql/src/pl_funcs.c
===================================================================
RCS file: /cvsroot/pgsql/src/pl/plpgsql/src/pl_funcs.c,v
retrieving revision 1.76
diff -c -r1.76 pl_funcs.c
*** src/pl/plpgsql/src/pl_funcs.c       18 Feb 2009 11:33:04 -0000      1.76
--- src/pl/plpgsql/src/pl_funcs.c       19 Apr 2009 16:28:04 -0000
***************
*** 17,22 ****
--- 17,24 ----
  
  #include <ctype.h>
  
+ #include "parser/gramparse.h"
+ #include "parser/gram.h"
  #include "parser/scansup.h"
  
  
***************
*** 460,465 ****
--- 462,502 ----
  
  
  /*
+  * plpgsql_parse_string_token - get the value represented by a string literal
+  *
+  * We do not make plpgsql's lexer produce the represented value, because
+  * in many cases we don't need it.  Instead this function is invoked when
+  * we do need it.  The input is the T_STRING token as identified by the lexer.
+  *
+  * The result is a palloc'd string.
+  *
+  * Note: this is called only from plpgsql's gram.y, but we can't just put it
+  * there because including parser/gram.h there would cause confusion.
+  */
+ char *
+ plpgsql_parse_string_token(const char *token)
+ {
+       int             ctoken;
+ 
+       /*
+        * We use the core lexer to do the dirty work.  Aside from getting the
+        * right results for escape sequences and so on, this helps us produce
+        * appropriate warnings for escape_string_warning etc.
+        */
+       scanner_init(token);
+ 
+       ctoken = base_yylex();
+ 
+       if (ctoken != SCONST)
+               elog(ERROR, "unexpected result from base lexer: %d", ctoken);
+ 
+       scanner_finish();
+ 
+       return base_yylval.str;
+ }
+ 
+ 
+ /*
   * Statement type as a string, for use in error messages etc.
   */
  const char *
Index: src/pl/plpgsql/src/plpgsql.h
===================================================================
RCS file: /cvsroot/pgsql/src/pl/plpgsql/src/plpgsql.h,v
retrieving revision 1.110
diff -c -r1.110 plpgsql.h
*** src/pl/plpgsql/src/plpgsql.h        9 Apr 2009 02:57:53 -0000       1.110
--- src/pl/plpgsql/src/plpgsql.h        19 Apr 2009 16:28:04 -0000
***************
*** 650,656 ****
        Oid                     fn_oid;
        TransactionId fn_xmin;
        ItemPointerData fn_tid;
!       int                     fn_functype;
        PLpgSQL_func_hashkey *fn_hashkey;       /* back-link to hashtable key */
        MemoryContext fn_cxt;
  
--- 650,656 ----
        Oid                     fn_oid;
        TransactionId fn_xmin;
        ItemPointerData fn_tid;
!       bool            fn_is_trigger;
        PLpgSQL_func_hashkey *fn_hashkey;       /* back-link to hashtable key */
        MemoryContext fn_cxt;
  
***************
*** 880,885 ****
--- 880,886 ----
   * ----------
   */
  extern void plpgsql_convert_ident(const char *s, char **output, int 
numidents);
+ extern char *plpgsql_parse_string_token(const char *token);
  extern const char *plpgsql_stmt_typename(PLpgSQL_stmt *stmt);
  extern void plpgsql_dumptree(PLpgSQL_function *func);
  
***************
*** 894,901 ****
  extern void plpgsql_push_back_token(int token);
  extern void plpgsql_yyerror(const char *message);
  extern int    plpgsql_scanner_lineno(void);
! extern void plpgsql_scanner_init(const char *str, int functype);
  extern void plpgsql_scanner_finish(void);
- extern char *plpgsql_get_string_value(void);
  
  #endif   /* PLPGSQL_H */
--- 895,901 ----
  extern void plpgsql_push_back_token(int token);
  extern void plpgsql_yyerror(const char *message);
  extern int    plpgsql_scanner_lineno(void);
! extern void plpgsql_scanner_init(const char *str);
  extern void plpgsql_scanner_finish(void);
  
  #endif   /* PLPGSQL_H */
Index: src/pl/plpgsql/src/scan.l
===================================================================
RCS file: /cvsroot/pgsql/src/pl/plpgsql/src/scan.l,v
retrieving revision 1.67
diff -c -r1.67 scan.l
*** src/pl/plpgsql/src/scan.l   18 Feb 2009 11:33:04 -0000      1.67
--- src/pl/plpgsql/src/scan.l   19 Apr 2009 16:28:04 -0000
***************
*** 19,45 ****
  #include "mb/pg_wchar.h"
  
  
- /* No reason to constrain amount of data slurped */
- #define YY_READ_BUF_SIZE 16777216
- 
  /* Avoid exit() on fatal scanner errors (a bit ugly -- see yy_fatal_error) */
  #undef fprintf
  #define fprintf(file, fmt, msg)  ereport(ERROR, (errmsg_internal("%s", msg)))
  
  /* Handles to the buffer that the lexer uses internally */
  static YY_BUFFER_STATE scanbufhandle;
  static char *scanbuf;
  
  static const char *scanstr;           /* original input string */
  
- static int    scanner_functype;
- static bool   scanner_typereported;
  static int    pushback_token;
  static bool have_pushback_token;
  static const char *cur_line_start;
  static int    cur_line_num;
  static char    *dolqstart;      /* current $foo$ quote start string */
! static int    dolqlen;                        /* signal to 
plpgsql_get_string_value */
  
  bool plpgsql_SpaceScanned = false;
  %}
--- 19,49 ----
  #include "mb/pg_wchar.h"
  
  
  /* Avoid exit() on fatal scanner errors (a bit ugly -- see yy_fatal_error) */
  #undef fprintf
  #define fprintf(file, fmt, msg)  ereport(ERROR, (errmsg_internal("%s", msg)))
  
+ /*
+  * When we parse a token that requires multiple lexer rules to process,
+  * remember the token's starting position this way.
+  */
+ #define SAVE_TOKEN_START()  \
+       ( start_lineno = plpgsql_scanner_lineno(), start_charpos = yytext )
+ 
  /* Handles to the buffer that the lexer uses internally */
  static YY_BUFFER_STATE scanbufhandle;
  static char *scanbuf;
  
  static const char *scanstr;           /* original input string */
  
  static int    pushback_token;
  static bool have_pushback_token;
  static const char *cur_line_start;
  static int    cur_line_num;
+ static int            xcdepth = 0;    /* depth of nesting in slash-star 
comments */
  static char    *dolqstart;      /* current $foo$ quote start string */
! 
! extern bool           standard_conforming_strings;
  
  bool plpgsql_SpaceScanned = false;
  %}
***************
*** 54,84 ****
  
  %option case-insensitive
  
  
! %x    IN_STRING
! %x    IN_COMMENT
! %x    IN_DOLLARQUOTE
  
  digit                 [0-9]
  ident_start           [A-Za-z\200-\377_]
  ident_cont            [A-Za-z\200-\377_0-9\$]
  
  quoted_ident  (\"[^\"]*\")+
  
  identifier            ({ident_start}{ident_cont}*|{quoted_ident})
  
  param                 \${digit}+
  
- space                 [ \t\n\r\f]
- 
- /* $foo$ style quotes ("dollar quoting")
-  * copied straight from the backend SQL parser
-  */
- dolq_start            [A-Za-z\200-\377_]
- dolq_cont             [A-Za-z\200-\377_0-9]
- dolqdelim             \$({dolq_start}{dolq_cont}*)?\$
- dolqinside            [^$]+
- 
  %%
      /* ----------
       * Local variables in scanner to remember where
--- 58,130 ----
  
  %option case-insensitive
  
+ /*
+  * Exclusive states are a subset of the core lexer's:
+  *  <xc> extended C-style comments
+  *  <xq> standard quoted strings
+  *  <xe> extended quoted strings (support backslash escape sequences)
+  *  <xdolq> $foo$ quoted strings
+  */
+ 
+ %x xc
+ %x xe
+ %x xq
+ %x xdolq
+ 
+ /*
+  * Definitions --- these generally must match the core lexer, but in some
+  * cases we can simplify, since we only care about identifying the token
+  * boundaries and not about deriving the represented value.  Also, we
+  * aren't trying to lex multicharacter operators so their interactions
+  * with comments go away.
+  */
+ 
+ space                 [ \t\n\r\f]
+ horiz_space           [ \t\f]
+ newline                       [\n\r]
+ non_newline           [^\n\r]
+ 
+ comment                       ("--"{non_newline}*)
+ 
+ whitespace            ({space}+|{comment})
+ special_whitespace            ({space}+|{comment}{newline})
+ horiz_whitespace              ({horiz_space}|{comment})
+ whitespace_with_newline       
({horiz_whitespace}*{newline}{special_whitespace}*)
+ 
+ quote                 '
+ quotestop             {quote}{whitespace}*
+ quotecontinue {quote}{whitespace_with_newline}{quote}
+ quotefail             {quote}{whitespace}*"-"
+ 
+ xestart                       [eE]{quote}
+ xeinside              [^\\']+
+ xeescape              [\\].
+ 
+ xqstart                       {quote}
+ xqdouble              {quote}{quote}
+ xqinside              [^']+
+ 
+ dolq_start            [A-Za-z\200-\377_]
+ dolq_cont             [A-Za-z\200-\377_0-9]
+ dolqdelim             \$({dolq_start}{dolq_cont}*)?\$
+ dolqfailed            \${dolq_start}{dolq_cont}*
+ dolqinside            [^$]+
  
! xcstart                       \/\*
! xcstop                        \*+\/
! xcinside              [^*/]+
  
  digit                 [0-9]
  ident_start           [A-Za-z\200-\377_]
  ident_cont            [A-Za-z\200-\377_0-9\$]
  
+ /* This is a simpler treatment of quoted identifiers than the core uses */
  quoted_ident  (\"[^\"]*\")+
  
  identifier            ({ident_start}{ident_cont}*|{quoted_ident})
  
  param                 \${digit}+
  
  %%
      /* ----------
       * Local variables in scanner to remember where
***************
*** 96,112 ****
      plpgsql_SpaceScanned = false;
  
      /* ----------
-      * On the first call to a new source report the
-      * function's type (T_FUNCTION or T_TRIGGER)
-      * ----------
-      */
-       if (!scanner_typereported)
-       {
-               scanner_typereported = true;
-               return scanner_functype;
-       }
- 
-     /* ----------
       * The keyword rules
       * ----------
       */
--- 142,147 ----
***************
*** 225,343 ****
  
  {digit}+              { return T_NUMBER;                      }
  
! \".                           {
!                               plpgsql_error_lineno = plpgsql_scanner_lineno();
!                               ereport(ERROR,
!                                               
(errcode(ERRCODE_DATATYPE_MISMATCH),
!                                                errmsg("unterminated quoted 
identifier")));
!                       }
! 
!     /* ----------
!      * Ignore whitespaces but remember this happened
!      * ----------
!      */
! {space}+              { plpgsql_SpaceScanned = true;          }
  
      /* ----------
!      * Eat up comments
       * ----------
       */
! --[^\r\n]*            ;
! 
! \/\*                  { start_lineno = plpgsql_scanner_lineno();
!                         BEGIN(IN_COMMENT);
!                       }
! <IN_COMMENT>\*\/      { BEGIN(INITIAL); plpgsql_SpaceScanned = true; }
! <IN_COMMENT>\n                ;
! <IN_COMMENT>.         ;
! <IN_COMMENT><<EOF>>   {
!                               plpgsql_error_lineno = start_lineno;
!                               ereport(ERROR,
!                                               
(errcode(ERRCODE_DATATYPE_MISMATCH),
!                                                errmsg("unterminated /* 
comment")));
!                       }
  
      /* ----------
!      * Collect anything inside of ''s and return one STRING token
!        *
!        * Hacking yytext/yyleng here lets us avoid using yymore(), which is
!        * a win for performance.  It's safe because we know the underlying
!        * input buffer is not changing.
       * ----------
       */
! '                     {
!                         start_lineno = plpgsql_scanner_lineno();
!                         start_charpos = yytext;
!                         BEGIN(IN_STRING);
!                       }
! [eE]'         {
!                         /* for now, treat the same as a regular literal */
!                         start_lineno = plpgsql_scanner_lineno();
!                         start_charpos = yytext;
!                         BEGIN(IN_STRING);
!                       }
! <IN_STRING>\\.                { }
! <IN_STRING>\\         { /* can only happen with \ at EOF */ }
! <IN_STRING>''         { }
! <IN_STRING>'          {
!                         /* tell plpgsql_get_string_value it's not a dollar 
quote */
!                         dolqlen = 0;
!                         /* adjust yytext/yyleng to describe whole string 
token */
!                         yyleng += (yytext - start_charpos);
!                         yytext = start_charpos;
!                         BEGIN(INITIAL);
!                         return T_STRING;
!                       }
! <IN_STRING>[^'\\]+    { }
! <IN_STRING><<EOF>>    {
!                               plpgsql_error_lineno = start_lineno;
!                               ereport(ERROR,
!                                               
(errcode(ERRCODE_DATATYPE_MISMATCH),
!                                                errmsg("unterminated quoted 
string")));
!                       }
! 
! {dolqdelim}           {
!                         start_lineno = plpgsql_scanner_lineno();
!                         start_charpos = yytext;
!                         dolqstart = pstrdup(yytext);
!                         BEGIN(IN_DOLLARQUOTE);
!                       }
! <IN_DOLLARQUOTE>{dolqdelim} {
!                         if (strcmp(yytext, dolqstart) == 0)
!                         {
!                                       pfree(dolqstart);
!                                       /* tell plpgsql_get_string_value it is 
a dollar quote */
!                                       dolqlen = yyleng;
                                        /* adjust yytext/yyleng to describe 
whole string token */
                                        yyleng += (yytext - start_charpos);
                                        yytext = start_charpos;
-                                       BEGIN(INITIAL);
                                        return T_STRING;
!                         }
!                         else
!                         {
!                                       /*
!                                        * When we fail to match $...$ to 
dolqstart, transfer
!                                        * the $... part to the output, but put 
back the final
!                                        * $ for rescanning.  Consider 
$delim$...$junk$delim$
!                                        */
!                                       yyless(yyleng-1);
!                         }
!                       }
! <IN_DOLLARQUOTE>{dolqinside} { }
! <IN_DOLLARQUOTE>.     { /* needed for $ inside the quoted text */ }
! <IN_DOLLARQUOTE><<EOF>>       {
!                               plpgsql_error_lineno = start_lineno;
!                               ereport(ERROR,
!                                               
(errcode(ERRCODE_DATATYPE_MISMATCH),
!                                                errmsg("unterminated 
dollar-quoted string")));
!                       }
  
      /* ----------
       * Any unmatched character is returned as is
       * ----------
       */
! .                     { return yytext[0];                     }
  
  %%
  
--- 260,393 ----
  
  {digit}+              { return T_NUMBER;                      }
  
! \".                           { yyerror("unterminated quoted identifier"); }
  
      /* ----------
!      * Ignore whitespace (including comments) but remember this happened
       * ----------
       */
! {whitespace}  { plpgsql_SpaceScanned = true; }
  
      /* ----------
!      * Comment and literal handling is mostly copied from the core lexer
       * ----------
       */
! {xcstart}             {
!                                       /* Set location in case of syntax error 
in comment */
!                                       SAVE_TOKEN_START();
!                                       xcdepth = 0;
!                                       BEGIN(xc);
!                                       plpgsql_SpaceScanned = true;
!                               }
! 
! <xc>{xcstart} {
!                                       xcdepth++;
!                               }
! 
! <xc>{xcstop}  {
!                                       if (xcdepth <= 0)
!                                               BEGIN(INITIAL);
!                                       else
!                                               xcdepth--;
!                               }
! 
! <xc>{xcinside}        {
!                                       /* ignore */
!                               }
! 
! <xc>\/+                       {
!                                       /* ignore */
!                               }
! 
! <xc>\*+                       {
!                                       /* ignore */
!                               }
! 
! <xc><<EOF>>           { yyerror("unterminated /* comment"); }
! 
! {xqstart}             {
!                                       SAVE_TOKEN_START();
!                                       if (standard_conforming_strings)
!                                               BEGIN(xq);
!                                       else
!                                               BEGIN(xe);
!                               }
! {xestart}             {
!                                       SAVE_TOKEN_START();
!                                       BEGIN(xe);
!                               }
! <xq,xe>{quotestop}    |
! <xq,xe>{quotefail} {
!                                       yyless(1);
!                                       BEGIN(INITIAL);
                                        /* adjust yytext/yyleng to describe 
whole string token */
                                        yyleng += (yytext - start_charpos);
                                        yytext = start_charpos;
                                        return T_STRING;
!                               }
! <xq,xe>{xqdouble} {
!                               }
! <xq>{xqinside}  {
!                               }
! <xe>{xeinside}  {
!                               }
! <xe>{xeescape}  {
!                               }
! <xq,xe>{quotecontinue} {
!                                       /* ignore */
!                               }
! <xe>.                 {
!                                       /* This is only needed for \ just 
before EOF */
!                               }
! <xq,xe><<EOF>>                { yyerror("unterminated quoted string"); }
! 
! {dolqdelim}           {
!                                       SAVE_TOKEN_START();
!                                       dolqstart = pstrdup(yytext);
!                                       BEGIN(xdolq);
!                               }
! {dolqfailed}  {
!                                       /* throw back all but the initial "$" */
!                                       yyless(1);
!                                       /* and treat it as {other} */
!                                       return yytext[0];
!                               }
! <xdolq>{dolqdelim} {
!                                       if (strcmp(yytext, dolqstart) == 0)
!                                       {
!                                               pfree(dolqstart);
!                                               BEGIN(INITIAL);
!                                               /* adjust yytext/yyleng to 
describe whole string */
!                                               yyleng += (yytext - 
start_charpos);
!                                               yytext = start_charpos;
!                                               return T_STRING;
!                                       }
!                                       else
!                                       {
!                                               /*
!                                                * When we fail to match $...$ 
to dolqstart, transfer
!                                                * the $... part to the output, 
but put back the final
!                                                * $ for rescanning.  Consider 
$delim$...$junk$delim$
!                                                */
!                                               yyless(yyleng-1);
!                                       }
!                               }
! <xdolq>{dolqinside} {
!                               }
! <xdolq>{dolqfailed} {
!                               }
! <xdolq>.              {
!                                       /* This is only needed for $ inside the 
quoted text */
!                               }
! <xdolq><<EOF>>        { yyerror("unterminated dollar-quoted string"); }
  
      /* ----------
       * Any unmatched character is returned as is
       * ----------
       */
! .                             {
!                                       return yytext[0];
!                               }
  
  %%
  
***************
*** 437,443 ****
   * to cite in error messages.
   */
  void
! plpgsql_scanner_init(const char *str, int functype)
  {
        Size    slen;
  
--- 487,493 ----
   * to cite in error messages.
   */
  void
! plpgsql_scanner_init(const char *str)
  {
        Size    slen;
  
***************
*** 460,468 ****
        /* Other setup */
        scanstr = str;
  
-     scanner_functype = functype;
-     scanner_typereported = false;
- 
        have_pushback_token = false;
  
        cur_line_start = scanbuf;
--- 510,515 ----
***************
*** 493,569 ****
        yy_delete_buffer(scanbufhandle);
        pfree(scanbuf);
  }
- 
- /*
-  * Called after a T_STRING token is read to get the string literal's value
-  * as a palloc'd string.  (We make this a separate call because in many
-  * scenarios there's no need to get the decoded value.)
-  *
-  * Note: we expect the literal to be the most recently lexed token.  This
-  * would not work well if we supported multiple-token pushback or if
-  * plpgsql_yylex() wanted to read ahead beyond a T_STRING token.
-  */
- char *
- plpgsql_get_string_value(void)
- {
-       char       *result;
-       const char *cp;
-       int                     len;
- 
-       if (dolqlen > 0)
-       {
-               /* Token is a $foo$...$foo$ string */
-               len = yyleng - 2 * dolqlen;
-               Assert(len >= 0);
-               result = (char *) palloc(len + 1);
-               memcpy(result, yytext + dolqlen, len);
-               result[len] = '\0';
-       }
-       else if (*yytext == 'E' || *yytext == 'e')
-       {
-               /* Token is an E'...' string */
-               result = (char *) palloc(yyleng + 1);   /* more than enough 
room */
-               len = 0;
-               for (cp = yytext + 2; *cp; cp++)
-               {
-                       if (*cp == '\'')
-                       {
-                               if (cp[1] == '\'')
-                                       result[len++] = *cp++;
-                               /* else it must be string end quote */
-                       }
-                       else if (*cp == '\\')
-                       {
-                               if (cp[1] != '\0')      /* just a paranoid 
check */
-                                       result[len++] = *(++cp);
-                       }
-                       else
-                               result[len++] = *cp;
-               }
-               result[len] = '\0';
-       }
-       else
-       {
-               /* Token is a '...' string */
-               result = (char *) palloc(yyleng + 1);   /* more than enough 
room */
-               len = 0;
-               for (cp = yytext + 1; *cp; cp++)
-               {
-                       if (*cp == '\'')
-                       {
-                               if (cp[1] == '\'')
-                                       result[len++] = *cp++;
-                               /* else it must be string end quote */
-                       }
-                       else if (*cp == '\\')
-                       {
-                               if (cp[1] != '\0')      /* just a paranoid 
check */
-                                       result[len++] = *(++cp);
-                       }
-                       else
-                               result[len++] = *cp;
-               }
-               result[len] = '\0';
-       }
-       return result;
- }
--- 540,542 ----
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to