On 11/1/07, Joseph S. Myers <[EMAIL PROTECTED]> wrote: > I haven't followed any developments relating to TR19769 in WG14 > after its publication in detail; has WG14 yet given an answer > on what should be done with u'C' where C represents a single > character that requires a surrogate pair to represent in UTF-16 > (to name one noted place where the TR underspecifies things)?
Pending such an answer, I think gcc should make such characters ill-formed. The text in the C TR is "The corresponding character constant is denoted by u'c-char-sequence' and has the type char16_t." Given that surrogate pairs are unrepresentable in that type, I conclude that the intent was to make character literals requiring surrogates ill-formed. The C++ standard also makes such characters ill-formed. Furthermore, making them ill-formed will be upward compatible should the C committee choose some other interpretation. > A TR is not a standard, so for C this must be disabled in all strict > conformance modes (note that it affects the rules for lexing and so > changes the semantics of conforming programs); likewise for C++98. > The C++0x draft includes the notation from TR19769, so the feature > should be enabled by default in C++0x (and so far as the C TR is > compatible with C++0x, both should be followed in both C and C++ > when the feature is enabled). Note that char16_t and char32_t are typedefs in C but primitive types in C++, just like wchar_t. -- Lawrence Crowl