Re: [PATCH] c++, v2: Implement C++23 P2071R2 - Named universal character escapes [PR106648]

Jakub Jelinek via Gcc-patches Thu, 01 Sep 2022 04:15:12 -0700

On Wed, Aug 31, 2022 at 12:14:22PM -0400, Jason Merrill wrote:
> On 8/31/22 11:07, Jakub Jelinek wrote:
> > On Wed, Aug 31, 2022 at 10:52:49AM -0400, Jason Merrill wrote:
> > > It could be more explicit, but I think we can assume that from the 
> > > existing
> > > wording; it says it designates the named character.  If there is no such
> > > character, that cannot be satisfied, so it must be ill-formed.
> > 
> > Ok.
> > 
> > > > So, we could reject the int h case above and accept silently the others?
> > > 
> > > Why not warn on the others?
> > 
> > We were always silent for the cases like \u123X or \U12345X.
> > Do you think we should emit some warnings (but never pedwarns/errors in that
> > case) that it is universal character name like but not completely?
> 
> I think that would be helpful, at least for \u{ and \N{.


Ok.

> > Given what you said above, I think that is what we want for the last 2
> > for C++23, the question is if it is ok also for C++20/C17 etc. and whether
> > it should depend on -pedantic or -pedantic-errors or GNU vs. ISO mode
> > or not in that case.  We could handle those 2 also differently, just
> > warn instead of error for the \N{ABC} case if not in C++23 mode when
> > identifier_pos.
> 
> That sounds right.
> 
> > Here is an incremental version of the patch which will make valid
> > \u{123} and \N{LATIN SMALL LETTER A WITH ACUTE} an extension in GNU
> > modes before C++23 and split it as separate tokens in ISO modes.
> 
> Looks good.

Here is a patch which implements that.
I just wonder if we shouldn't have some warning option that would cover
these warnings, currently one needs to use -w to disable those warnings.

Apparently clang uses -Wunicode option to cover these, but unfortunately
they don't bother to document it (nor almost any other warning option),
so it is unclear what else exactly it covers.  Plus a question is how
we should document that option for GCC...

2022-09-01  Jakub Jelinek  <ja...@redhat.com>

        * charset.cc (_cpp_valid_ucn): In possible identifier contexts, don't
        handle \u{ or \N{ specially in -std=c* modes except -std=c++2{3,b}.
        In possible identifier contexts, don't emit an error and punt
        if \N isn't followed by {, or if \N{} surrounds some lower case
        letters or _.  In possible identifier contexts when not C++23, don't
        emit an error but warning about unknown character names and treat as
        separate tokens.  When treating as separate tokens \u{ or \N{, emit
        warnings.

        * c-c++-common/cpp/delimited-escape-seq-4.c: New test.
        * c-c++-common/cpp/delimited-escape-seq-5.c: New test.
        * c-c++-common/cpp/named-universal-char-escape-5.c: New test.
        * c-c++-common/cpp/named-universal-char-escape-6.c: New test.
        * g++.dg/cpp23/named-universal-char-escape1.C: New test.
        * g++.dg/cpp23/named-universal-char-escape2.C: New test.

--- libcpp/charset.cc.jj        2022-09-01 09:47:24.146886929 +0200
+++ libcpp/charset.cc   2022-09-01 12:52:28.424034208 +0200
@@ -1448,7 +1448,11 @@ _cpp_valid_ucn (cpp_reader *pfile, const
   if (str[-1] == 'u')
     {
       length = 4;
-      if (str < limit && *str == '{')
+      if (str < limit
+         && *str == '{'
+         && (!identifier_pos
+             || CPP_OPTION (pfile, delimited_escape_seqs)
+             || !CPP_OPTION (pfile, std)))
        {
          str++;
          /* Magic value to indicate no digits seen.  */
@@ -1462,8 +1466,22 @@ _cpp_valid_ucn (cpp_reader *pfile, const
   else if (str[-1] == 'N')
     {
       length = 4;
+      if (identifier_pos
+         && !CPP_OPTION (pfile, delimited_escape_seqs)
+         && CPP_OPTION (pfile, std))
+       {
+         *cp = 0;
+         return false;
+       }
       if (str == limit || *str != '{')
-       cpp_error (pfile, CPP_DL_ERROR, "'\\N' not followed by '{'");
+       {
+         if (identifier_pos)
+           {
+             *cp = 0;
+             return false;
+           }
+         cpp_error (pfile, CPP_DL_ERROR, "'\\N' not followed by '{'");
+       }
       else
        {
          str++;
@@ -1489,8 +1507,16 @@ _cpp_valid_ucn (cpp_reader *pfile, const
 
          if (str < limit && *str == '}')
            {
-             if (name == str && identifier_pos)
+             if (identifier_pos && (name == str || !strict))
                {
+                 if (name == str)
+                   cpp_warning (pfile, CPP_W_NONE,
+                                "empty named universal character escape "
+                                "sequence; treating it as separate tokens");
+                 else
+                   cpp_warning (pfile, CPP_W_NONE,
+                                "incomplete named universal character escape "
+                                "sequence; treating it as separate tokens");
                  *cp = 0;
                  return false;
                }
@@ -1515,27 +1541,48 @@ _cpp_valid_ucn (cpp_reader *pfile, const
                                           uname2c_tree, NULL);
                  if (result == (cppchar_t) -1)
                    {
-                     cpp_error (pfile, CPP_DL_ERROR,
-                                "\\N{%.*s} is not a valid universal "
-                                "character", (int) (str - name), name);
+                     bool ret = true;
+                     if (identifier_pos
+                         && !CPP_OPTION (pfile, delimited_escape_seqs))
+                       ret = cpp_warning (pfile, CPP_W_NONE,
+                                          "\\N{%.*s} is not a valid "
+                                          "universal character; treating it "
+                                          "as separate tokens",
+                                          (int) (str - name), name);
+                     else
+                       cpp_error (pfile, CPP_DL_ERROR,
+                                  "\\N{%.*s} is not a valid universal "
+                                  "character", (int) (str - name), name);
 
                      /* Try to do a loose name lookup according to
                         Unicode loose matching rule UAX44-LM2.  */
                      char canon_name[uname2c_max_name_len + 1];
                      result = _cpp_uname2c_uax44_lm2 ((const char *) name,
                                                       str - name, canon_name);
-                     if (result != (cppchar_t) -1)
+                     if (result != (cppchar_t) -1 && ret)
                        cpp_error (pfile, CPP_DL_NOTE,
                                   "did you mean \\N{%s}?", canon_name);
                      else
-                       result = 0x40;
+                       result = 0xC0;
+                     if (identifier_pos
+                         && !CPP_OPTION (pfile, delimited_escape_seqs))
+                       {
+                         *cp = 0;
+                         return false;
+                       }
                    }
                }
              str++;
              extend_char_range (char_range, loc_reader);
            }
          else if (identifier_pos)
-           length = 1;
+           {
+             cpp_warning (pfile, CPP_W_NONE,
+                          "incomplete named universal character escape "
+                          "sequence; treating it as separate tokens");
+             *cp = 0;
+             return false;
+           }
          else
            {
              cpp_error (pfile, CPP_DL_ERROR,
@@ -1584,12 +1631,17 @@ _cpp_valid_ucn (cpp_reader *pfile, const
       }
     while (--length);
 
-  if (delimited
-      && str < limit
-      && *str == '}'
-      && (length != 32 || !identifier_pos))
+  if (delimited && str < limit && *str == '}')
     {
-      if (length == 32)
+      if (length == 32 && identifier_pos)
+       {
+         cpp_warning (pfile, CPP_W_NONE,
+                      "empty delimited escape sequence; "
+                      "treating it as separate tokens");
+         *cp = 0;
+         return false;
+       }
+      else if (length == 32)
        cpp_error (pfile, CPP_DL_ERROR,
                   "empty delimited escape sequence");
       else if (!CPP_OPTION (pfile, delimited_escape_seqs)
@@ -1607,6 +1659,10 @@ _cpp_valid_ucn (cpp_reader *pfile, const
      error message in that case.  */
   if (length && identifier_pos)
     {
+      if (delimited)
+       cpp_warning (pfile, CPP_W_NONE,
+                    "incomplete delimited escape sequence; "
+                    "treating it as separate tokens");
       *cp = 0;
       return false;
     }
--- gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-4.c.jj  2022-09-01 
11:46:26.951102634 +0200
+++ gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-4.c     2022-09-01 
12:50:17.351810090 +0200
@@ -0,0 +1,13 @@
+/* P2290R3 - Delimited escape sequences */
+/* { dg-do compile } */
+/* { dg-require-effective-target wchar } */
+/* { dg-options "-std=gnu99 -Wno-c++-compat" { target c } } */
+/* { dg-options "-std=gnu++20" { target c++ } } */
+
+#define z(x) 0
+#define a z(
+int b = a\u{});                /* { dg-warning "empty delimited escape 
sequence; treating it as separate tokens" } */
+int c = a\u{);         /* { dg-warning "incomplete delimited escape sequence; 
treating it as separate tokens" } */
+int d = a\u{12XYZ});   /* { dg-warning "incomplete delimited escape sequence; 
treating it as separate tokens" } */
+int e = a\u123);
+int f = a\U1234567);
--- gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-5.c.jj  2022-09-01 
11:46:26.951102634 +0200
+++ gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-5.c     2022-09-01 
12:57:50.830665924 +0200
@@ -0,0 +1,13 @@
+/* P2290R3 - Delimited escape sequences */
+/* { dg-do compile } */
+/* { dg-require-effective-target wchar } */
+/* { dg-options "-std=c17 -Wno-c++-compat" { target c } } */
+/* { dg-options "-std=c++23" { target c++ } } */
+
+#define z(x) 0
+#define a z(
+int b = a\u{});                /* { dg-warning "empty delimited escape 
sequence; treating it as separate tokens" "" { target c++23 } } */
+int c = a\u{);         /* { dg-warning "incomplete delimited escape sequence; 
treating it as separate tokens" "" { target c++23 } } */
+int d = a\u{12XYZ});   /* { dg-warning "incomplete delimited escape sequence; 
treating it as separate tokens" "" { target c++23 } } */
+int e = a\u123);
+int f = a\U1234567);
--- gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-5.c.jj   
2022-09-01 11:46:26.951102634 +0200
+++ gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-5.c      
2022-09-01 12:48:01.753647302 +0200
@@ -0,0 +1,15 @@
+/* P2071R2 - Named universal character escapes */
+/* { dg-do compile } */
+/* { dg-require-effective-target wchar } */
+/* { dg-options "-std=gnu99 -Wno-c++-compat" { target c } } */
+/* { dg-options "-std=gnu++20" { target c++ } } */
+
+#define z(x) 0
+#define a z(
+int b = a\N{});                                /* { dg-warning "empty named 
universal character escape sequence; treating it as separate tokens" } */
+int c = a\N{);                         /* { dg-warning "incomplete named 
universal character escape sequence; treating it as separate tokens" } */
+int d = a\N);
+int e = a\NARG);
+int f = a\N{abc});                     /* { dg-warning "incomplete named 
universal character escape sequence; treating it as separate tokens" } */
+int g = a\N{ABC.123});                 /* { dg-warning "incomplete named 
universal character escape sequence; treating it as separate tokens" } */
+int h = a\N{NON-EXISTENT CHAR});       /* { dg-warning "is not a valid 
universal character" } */
--- gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-6.c.jj   
2022-09-01 11:46:26.951102634 +0200
+++ gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-6.c      
2022-09-01 11:46:26.951102634 +0200
@@ -0,0 +1,16 @@
+/* P2071R2 - Named universal character escapes */
+/* { dg-do compile } */
+/* { dg-require-effective-target wchar } */
+/* { dg-options "-std=c17 -Wno-c++-compat" { target c } } */
+/* { dg-options "-std=c++20" { target c++ } } */
+
+#define z(x) 0
+#define a z(
+int b = a\N{});
+int c = a\N{);
+int d = a\N);
+int e = a\NARG);
+int f = a\N{abc});
+int g = a\N{ABC.123});
+int h = a\N{NON-EXISTENT CHAR});       /* { dg-bogus "is not a valid universal 
character" } */
+int i = a\N{LATIN SMALL CHARACTER A WITH ACUTE});
--- gcc/testsuite/g++.dg/cpp23/named-universal-char-escape1.C.jj        
2022-09-01 11:46:26.951102634 +0200
+++ gcc/testsuite/g++.dg/cpp23/named-universal-char-escape1.C   2022-09-01 
12:56:48.031516792 +0200
@@ -0,0 +1,14 @@
+// P2071R2 - Named universal character escapes
+// { dg-do compile }
+// { dg-require-effective-target wchar }
+
+#define z(x) 0
+#define a z(
+int b = a\N{});                                // { dg-warning "empty named 
universal character escape sequence; treating it as separate tokens" "" { 
target c++23 } }
+int c = a\N{);                         // { dg-warning "incomplete named 
universal character escape sequence; treating it as separate tokens" "" { 
target c++23 } }
+int d = a\N);
+int e = a\NARG);
+int f = a\N{abc});                     // { dg-warning "incomplete named 
universal character escape sequence; treating it as separate tokens" "" { 
target c++23 } }
+int g = a\N{ABC.123});                 // { dg-warning "incomplete named 
universal character escape sequence; treating it as separate tokens" "" { 
target c++23 } }
+int h = a\N{NON-EXISTENT CHAR});       // { dg-error "is not a valid universal 
character" "" { target c++23 } }
+                                       // { dg-error "was not declared in this 
scope" "" { target c++23 } .-1 }
--- gcc/testsuite/g++.dg/cpp23/named-universal-char-escape2.C.jj        
2022-09-01 12:54:47.436150733 +0200
+++ gcc/testsuite/g++.dg/cpp23/named-universal-char-escape2.C   2022-09-01 
12:59:22.512428644 +0200
@@ -0,0 +1,16 @@
+// P2071R2 - Named universal character escapes
+// { dg-do compile }
+// { dg-require-effective-target wchar }
+// { dg-options "" }
+
+#define z(x) 0
+#define a z(
+int b = a\N{});                                // { dg-warning "empty named 
universal character escape sequence; treating it as separate tokens" }
+int c = a\N{);                         // { dg-warning "incomplete named 
universal character escape sequence; treating it as separate tokens" }
+int d = a\N);
+int e = a\NARG);
+int f = a\N{abc});                     // { dg-warning "incomplete named 
universal character escape sequence; treating it as separate tokens" }
+int g = a\N{ABC.123});                 // { dg-warning "incomplete named 
universal character escape sequence; treating it as separate tokens" }
+int h = a\N{NON-EXISTENT CHAR});       // { dg-error "is not a valid universal 
character" "" { target c++23 } }
+                                       // { dg-error "was not declared in this 
scope" "" { target c++23 } .-1 }
+                                       // { dg-warning "is not a valid 
universal character; treating it as separate tokens" "" { target c++20_down } 
.-2 }


        Jakub

Re: [PATCH] c++, v2: Implement C++23 P2071R2 - Named universal character escapes [PR106648]

Reply via email to