On 7/23/24 4:18 PM, Marek Polacek wrote:
On Tue, Jul 23, 2024 at 12:53:07AM -0400, Jason Merrill wrote:
On 7/20/24 2:31 PM, Marek Polacek wrote:
[ Entering the contest to fix the oldest PR in this cycle. ]

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
This 18-year-old PR reports that we parse certain comma expressions
as a declaration rather than statement when the statement begins with
a functional-style cast expression.  Consider

    int(x), 0;

which does not declare x--it only casts x to int--, whereas

    int(x), (y);

declares x and y.  We need some kind of look-ahead to decide how we
should disambiguate the construct, because cp_parser_init_declarator
commits eagerly once it has seen "int(x)", and then it's too late to
recover.

This patch makes us try to parse the code as a sequence of declarators;
if that fails, we are likely looking at a statement.  That's a simple
idea, but it's complicated by code like

    void (*p)(void *)(fun);

which initializes a pointer-to-function, or

    int(x), (x) + 1;

which is an expression statement, but the second (x) is parsed as
a valid declarator, only the + after reveals that the whole thing
is an expression.  You can have things like

    int(**p)

which by itself doesn't tell you much.  You can have

    int(*q)(void*)

which looks like it starts with a functional-style cast, but it is not
a cast.  The simple

    int(x) = 42;

has an initializer so it declares x; it is not an assignment.  But then,

      int(d) __attribute__(());

does not have an initializer, but the attribute makes it a declaration.

        PR c++/29834
        PR c++/54905

gcc/cp/ChangeLog:

        * parser.cc (cp_parser_lambda_introducer): Use
        cp_parser_next_token_starts_initializer_p.
        (cp_parser_simple_declaration): Add look-ahead to decide if we're
        looking at a declaration or statement.
        (cp_parser_next_token_starts_initializer_p): New.

gcc/testsuite/ChangeLog:

        * g++.dg/parse/ambig15.C: New test.
        * g++.dg/parse/ambig16.C: New test.
---
   gcc/cp/parser.cc                     | 73 ++++++++++++++++++++++--
   gcc/testsuite/g++.dg/parse/ambig15.C | 83 ++++++++++++++++++++++++++++
   gcc/testsuite/g++.dg/parse/ambig16.C | 18 ++++++
   3 files changed, 168 insertions(+), 6 deletions(-)
   create mode 100644 gcc/testsuite/g++.dg/parse/ambig15.C
   create mode 100644 gcc/testsuite/g++.dg/parse/ambig16.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 1fa0780944b..797cfc3204e 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -2947,6 +2947,8 @@ static bool cp_parser_next_token_ends_template_argument_p
     (cp_parser *);
   static bool cp_parser_nth_token_starts_template_argument_list_p
     (cp_parser *, size_t);
+static bool cp_parser_next_token_starts_initializer_p
+  (cp_parser *);
   static enum tag_types cp_parser_token_is_class_key
     (cp_token *);
   static enum tag_types cp_parser_token_is_type_parameter_key
@@ -11663,9 +11665,7 @@ cp_parser_lambda_introducer (cp_parser* parser, tree 
lambda_expr)
        }
         /* Find the initializer for this capture.  */
-      if (cp_lexer_next_token_is (parser->lexer, CPP_EQ)
-         || cp_lexer_next_token_is (parser->lexer, CPP_OPEN_PAREN)
-         || cp_lexer_next_token_is (parser->lexer, CPP_OPEN_BRACE))
+      if (cp_parser_next_token_starts_initializer_p (parser))
        {
          /* An explicit initializer exists.  */
          if (cxx_dialect < cxx14)
@@ -11747,9 +11747,7 @@ cp_parser_lambda_introducer (cp_parser* parser, tree 
lambda_expr)
                  /* If what follows is an initializer, the second '...' is
                     invalid.  But for cases like [...xs...], the first one
                     is invalid.  */
-                 if (cp_lexer_next_token_is (parser->lexer, CPP_EQ)
-                     || cp_lexer_next_token_is (parser->lexer, CPP_OPEN_PAREN)
-                     || cp_lexer_next_token_is (parser->lexer, CPP_OPEN_BRACE))
+                 if (cp_parser_next_token_starts_initializer_p (parser))
                    ellipsis_loc = loc;
                  error_at (ellipsis_loc, "too many %<...%> in lambda capture");
                  continue;
@@ -16047,6 +16045,58 @@ cp_parser_simple_declaration (cp_parser* parser,
       else
         break;
+  /* If we are still uncommitted, we're probably looking at something like
+     T(x), which can be a declaration but does not have to be, depending
+     on what comes after.  Consider
+       int(x), 0;
+     which is _not_ a declaration of x, it's a functional cast, and
+       int(x), (y);
+     which declares x and y.  We need some kind of look-ahead to decide,
+     cp_parser_init_declarator below will commit eagerly once it has seen
+     "int(x)".  So we try to parse this as a sequence of declarators; if
+     that fails, we are likely looking at a statement.  (We could avoid
+     all of this if there is no non-nested comma.)  */

Unfortunately, this seems to break

int main()
{
   int(x), y[sizeof(x)];
}

Oy.  That's a problem, thanks for catching that.  So:

   void f(int v, int *w)
   {
     int(x), y[sizeof(x)]; // decl
     int(v), w[sizeof(v)], true; // expr
   }

The problem is that the cp_parser_declarator call I added was meant to
be purely syntactic checking -- does it _look_ like it could be a decl? --
but here it emits an error.

Is it worth experimenting with a new CP_PARSER_SYNTAX_ONLY flag to
prevent emitting any errors?

I've wished for a while that we would store diagnostics during tentative parsing and then emit them all once we commit to the tentative parse.

With that functionality, we could parse the decls normally (but tentatively) and then go back and remove them from the current scope if they turn out not to have actually been declarations?

Extremely vexing.

Jason

Reply via email to