On Mon, Aug 16, 2021 at 04:21:00PM -0400, Jason Merrill wrote:
> > I see for the UTF-8 chars we have:
> >        switch (ucn_valid_in_identifier (pfile, *cp, nst))
> >          {
> >          case 0:
> >            /* In C++, this is an error for invalid character in an 
> > identifier
> >               because logically, the UTF-8 was converted to a UCN during
> >               translation phase 1 (even though we don't physically do it 
> > that
> >               way).  In C, this byte rather becomes grammatically a separate
> >               token.  */
> >            if (CPP_OPTION (pfile, cplusplus))
> >              cpp_error (pfile, CPP_DL_ERROR,
> >                         "extended character %.*s is not valid in an 
> > identifier",
> >                         (int) (*pstr - base), base);
> >            else
> >              {
> >                *pstr = base;
> >                return false;
> >              }
> > So, shall we behave the same as C for cxx23_identifiers here?  And shall we
> > do something similar for the UCNs in \uxxxx and \Uxxxxxxxx forms?
> > Confused...
> 
> I tend to agree with Joseph's comment on your followup patch about this
> issue; do you?

It isn't clear to me if it is ok that it is an error even with just -E,
i.e. whether
"If a single universal-character-name does not match any of the other
preprocessing token categories, the program is ill-formed."
applies already in translation phase 4 which is what -E emits (or some other
one?), or only in phase 7 when converting preprocessing tokens to tokens.

But sure, if you agree with Joseph that the followup isn't needed, the
diagnostics is much better that way and I'd certainly prefer just this
patch and not the follow-up.

If not -E, I guess the standard is clear that it is invalid and how exactly
we diagnose it is QoI.

        Jakub

Reply via email to