On Wed, Dec 25, 2013 at 11:01:13PM +0000, Thorsten Glaser wrote: > Silvan Jegen dixit: > > >Wouldn't a 16-bit wchar_t be non-standard-conform when using a UTF-8 > >locale? > > Nope. UTF-8 is just an encoding for Unicode, and as long as I take > care to #define __STDC_ISO_10646__ 200009L (and no later date) this > is perfectly permissible. > > (And please do not language-lawyer me, Ib> and since I can prove that 100% > POSIX compliance is probably illegal > in my country, I donb This is only a possibility for implementations which only support the BMP (Basic Multilingual Plane, aka plane 0, of Unicode, covering Unicode Scalar Values in the range 0 to 65535). It's fundamentally impossible in the C language to support UTF-8 with the full Unicode range as a locale's multibyte encoding when wchar_t is 16-bit; this is not only a formal requirement (although it is one; C states explicitly that there are no "multi-wchar_t characters") but also a fundamental limitation of the mbrtowc and related interfaces, which cannot support UTF-16.
It would be possible for an implementation with 16-bit wchar_t to "support" the full Unicode range using UTF-16 for wchar_t and CESU-8 for the multibyte encoding, but this would be even worse than not supporting non-BMP characters from an interoperability standpoint. Basically, making wchar_t 16-bit is just a broken implementation choice. > This just means that your C locale cannot be strictly UTF-8. All > others can, but the C locale is precisely for this. This is because > the C locale is special like that. It's not special like that in any current or past issue of the standard, but the proposal here is to change it so it is special like that. I object to this change. Rich