Re: [PATCH] libcpp: For C++23 treat UCNs and UTF-8 chars not valid in identifiers as separate tokens

Joseph Myers Fri, 06 Aug 2021 13:08:47 -0700

On Fri, 6 Aug 2021, Jakub Jelinek via Gcc-patches wrote:

> On Fri, Aug 06, 2021 at 11:53:56AM +0200, Jakub Jelinek via Gcc-patches wrote:
> > Actually, there is another change in P1949R7 that I haven't touched
> > in the patch and not sure what the implications are.
> > 
> > To the preprocessing-token non-terminal it adds
> >     each universal-character-name that cannot be one of the above
> > and changes the following paragraph:
> >  ...
> >  preprocessing operators and punctuators, and single
> > +universal-character-names and
> >  non-whitespace characters that do not lexically match the other
> >  preprocessing token categories.
> > +If a single universal-character-name does not match any of the other
> > +preprocessing token categories, the program is ill-formed.
> >  If a ' or a " character matches the last category, the behavior
> >  is undefined.
> >  ...
> 
> If the above (and identifier-start and identifier-continue non-terminals
> only mentioning XID_Start+0x5F and XID_Continue UCNs) means that we should
> indeed put each such UTF-8 char or UCN into a separate CPP_OTHER token
> for C++23, then we need something like this incremental patch.
> The drawback is worse diagnostics though, so maybe it would be useful if
> the cpp_error that ... is not valid in an identifier or is not
> valid at the start of an identifier would be emitted as a warning (and not
> warn when skipping)?


It's not clear to me that this change to the standard actually requires 
any change in how GCC behaves.  A UCN (or character considered to be 
converted to a UCN) that's not valid in identifiers is still invalid in a 
context where an identifier preprocessing token could occur (including in 
#if 0), whether it's interpreted as a "single UCN" preprocessing token 
(stated to be ill-formed) or (part of) an invalid identifier preprocessing 
token.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] libcpp: For C++23 treat UCNs and UTF-8 chars not valid in identifiers as separate tokens

Reply via email to