On Wed, Dec 25, 2013 at 11:01:13PM +0000, Thorsten Glaser wrote:
> Silvan Jegen dixit:
> 
> >Wouldn't a 16-bit wchar_t be non-standard-conform when using a UTF-8
> >locale?
> 
> Nope. UTF-8 is just an encoding for Unicode, and as long as I take
> care to #define __STDC_ISO_10646__ 200009L (and no later date) this
> is perfectly permissible.
> 
> (And please do not language-lawyer me, Ib> and since I can prove that 100% 
> POSIX compliance is probably illegal
> in my country, I donb
This is only a possibility for implementations which only support the
BMP (Basic Multilingual Plane, aka plane 0, of Unicode, covering
Unicode Scalar Values in the range 0 to 65535). It's fundamentally
impossible in the C language to support UTF-8 with the full Unicode
range as a locale's multibyte encoding when wchar_t is 16-bit; this is
not only a formal requirement (although it is one; C states explicitly
that there are no "multi-wchar_t characters") but also a fundamental
limitation of the mbrtowc and related interfaces, which cannot support
UTF-16.

It would be possible for an implementation with 16-bit wchar_t to
"support" the full Unicode range using UTF-16 for wchar_t and CESU-8
for the multibyte encoding, but this would be even worse than not
supporting non-BMP characters from an interoperability standpoint.

Basically, making wchar_t 16-bit is just a broken implementation
choice.

> This just means that your C locale cannot be strictly UTF-8. All
> others can, but the C locale is precisely for this. This is because
> the C locale is special like that.

It's not special like that in any current or past issue of the
standard, but the proposal here is to change it so it is special like
that. I object to this change.

Rich

Reply via email to