On 8/31/22 11:07, Jakub Jelinek wrote:
On Wed, Aug 31, 2022 at 10:52:49AM -0400, Jason Merrill wrote:
It could be more explicit, but I think we can assume that from the existing
wording; it says it designates the named character. If there is no such
character, that cannot be satisfied, so it must be ill-formed.
Ok.
So, we could reject the int h case above and accept silently the others?
Why not warn on the others?
We were always silent for the cases like \u123X or \U12345X.
Do you think we should emit some warnings (but never pedwarns/errors in that
case) that it is universal character name like but not completely?
I think that would be helpful, at least for \u{ and \N{.
The following patch let's us silently accept:
#define z(x) 0
#define a z(
int b = a\u{});
int c = a\u{);
int d = a\N{});
int e = a\N{);
int f = a\u123);
int g = a\U1234567);
int h = a\N);
int i = a\NARG);
int j = a\N{abc});
int k = a\N{ABC.123});
The following 2 will be still rejected with errors:
int l = a\N{ABC});
int m = a\N{LATIN SMALL LETTER A WITH ACUTE});
the first one because ABC is not valid Unicode name and the latter because
it will be int m = aĆ”); and will trigger other errors later.
Given what you said above, I think that is what we want for the last 2
for C++23, the question is if it is ok also for C++20/C17 etc. and whether
it should depend on -pedantic or -pedantic-errors or GNU vs. ISO mode
or not in that case. We could handle those 2 also differently, just
warn instead of error for the \N{ABC} case if not in C++23 mode when
identifier_pos.
That sounds right.
Here is an incremental version of the patch which will make valid
\u{123} and \N{LATIN SMALL LETTER A WITH ACUTE} an extension in GNU
modes before C++23 and split it as separate tokens in ISO modes.
Looks good.
Jason