On 8/16/21 4:51 PM, Jakub Jelinek wrote:
On Mon, Aug 16, 2021 at 04:21:00PM -0400, Jason Merrill wrote:
I see for the UTF-8 chars we have:
switch (ucn_valid_in_identifier (pfile, *cp, nst))
{
case 0:
/* In C++, this is an error for invalid character in an identifier
because logically, the UTF-8 was converted to a UCN during
translation phase 1 (even though we don't physically do it that
way). In C, this byte rather becomes grammatically a separate
token. */
if (CPP_OPTION (pfile, cplusplus))
cpp_error (pfile, CPP_DL_ERROR,
"extended character %.*s is not valid in an
identifier",
(int) (*pstr - base), base);
else
{
*pstr = base;
return false;
}
So, shall we behave the same as C for cxx23_identifiers here? And shall we
do something similar for the UCNs in \uxxxx and \Uxxxxxxxx forms?
Confused...
I tend to agree with Joseph's comment on your followup patch about this
issue; do you?
It isn't clear to me if it is ok that it is an error even with just -E,
i.e. whether
"If a single universal-character-name does not match any of the other
preprocessing token categories, the program is ill-formed."
applies already in translation phase 4 which is what -E emits (or some other
one?), or only in phase 7 when converting preprocessing tokens to tokens.
I read it as applying in phase 3.
But sure, if you agree with Joseph that the followup isn't needed, the
diagnostics is much better that way and I'd certainly prefer just this
patch and not the follow-up.
If not -E, I guess the standard is clear that it is invalid and how exactly
we diagnose it is QoI.
Jakub