On Tue, 24 Sep 2019, Eric Botcazou wrote: > Hi, > > the Universal Character Names accepted by the C family of compilers are > mapped > to those of ISO/IEC 10646, which defines the Universal Character Set > codespace > as the range 0-0x10FFFF inclusive. The upper bound is already enforced for > identifiers but not for literals, so the following code is accepted in C99: > > #include <stddef.h> > > wchar_t a = L'\U00110000'; > > whereas it is rejected with an error by other compilers (Clang, MSVC). > > I'm not sure whether the compiler is really equired to issue a diagnostic in > this case. Moreover a few tests in the testsuite manipulate UCNs outside the > UCS codespace. That's why I suggest issuing a pedantic warning.
For C, I think such UCNs violate the Semantics but not the Constraints on UCNs, so no diagnostic is actually required in C, although it is permitted as a pedwarn / error. However, while C++ doesn't have that Semantics / Constraints division, it's also the case that before C++2a, C++ only has a dated normative reference to ISO/IEC 10646-1:1993 (C++2a adds an undated reference and says the dated one is only for deprecated features, as well as explicitly making such UCNs outside the ISO 10646 code point range ill-formed). So I think that for C++, this is only correct as an error / pedwarn in the C++2a case. -- Joseph S. Myers jos...@codesourcery.com