On 11/1/07, Joseph S. Myers <[EMAIL PROTECTED]> wrote:
> I haven't followed any developments relating to TR19769 in WG14
> after its publication in detail; has WG14 yet given an answer
> on what should be done with u'C' where C represents a single
> character that requires a surrogate pair to represent in UTF-16
> (to name one noted place where the TR underspecifies things)?

Pending such an answer, I think gcc should make such characters
ill-formed.  The text in the C TR is "The corresponding character
constant is denoted by u'c-char-sequence' and has the type char16_t."
Given that surrogate pairs are unrepresentable in that type, I
conclude that the intent was to make character literals requiring
surrogates ill-formed.  The C++ standard also makes such characters
ill-formed.  Furthermore, making them ill-formed will be upward
compatible should the C committee choose some other interpretation.

> A TR is not a standard, so for C this must be disabled in all strict
> conformance modes (note that it affects the rules for lexing and so
> changes the semantics of conforming programs); likewise for C++98.
> The C++0x draft includes the notation from TR19769, so the feature
> should be enabled by default in C++0x (and so far as the C TR is
> compatible with C++0x, both should be followed in both C and C++
> when the feature is enabled).

Note that char16_t and char32_t are typedefs in C but primitive types
in C++, just like wchar_t.

-- 
Lawrence Crowl

Reply via email to