On Fri, 6 Aug 2021, Jakub Jelinek via Gcc-patches wrote: > On Fri, Aug 06, 2021 at 11:53:56AM +0200, Jakub Jelinek via Gcc-patches wrote: > > Actually, there is another change in P1949R7 that I haven't touched > > in the patch and not sure what the implications are. > > > > To the preprocessing-token non-terminal it adds > > each universal-character-name that cannot be one of the above > > and changes the following paragraph: > > ... > > preprocessing operators and punctuators, and single > > +universal-character-names and > > non-whitespace characters that do not lexically match the other > > preprocessing token categories. > > +If a single universal-character-name does not match any of the other > > +preprocessing token categories, the program is ill-formed. > > If a ' or a " character matches the last category, the behavior > > is undefined. > > ... > > If the above (and identifier-start and identifier-continue non-terminals > only mentioning XID_Start+0x5F and XID_Continue UCNs) means that we should > indeed put each such UTF-8 char or UCN into a separate CPP_OTHER token > for C++23, then we need something like this incremental patch. > The drawback is worse diagnostics though, so maybe it would be useful if > the cpp_error that ... is not valid in an identifier or is not > valid at the start of an identifier would be emitted as a warning (and not > warn when skipping)?
It's not clear to me that this change to the standard actually requires any change in how GCC behaves. A UCN (or character considered to be converted to a UCN) that's not valid in identifiers is still invalid in a context where an identifier preprocessing token could occur (including in #if 0), whether it's interpreted as a "single UCN" preprocessing token (stated to be ill-formed) or (part of) an invalid identifier preprocessing token. -- Joseph S. Myers jos...@codesourcery.com