[dev] Re: 8-bit transparency in the C locale vs. UTF-8 support

Thorsten Glaser Wed, 25 Dec 2013 18:41:44 -0800

Rich Felker dixit:

>> >Wouldn't a 16-bit wchar_t be non-standard-conform when using a UTF-8
>> >locale?
>>
>> Nope. UTF-8 is just an encoding for Unicode, and as long as I take
>> care to #define __STDC_ISO_10646__ 200009L (and no later date) this
>> is perfectly permissible.


>This is only a possibility for implementations which only support the
>BMP (Basic Multilingual Plane, aka plane 0, of Unicode, covering
>Unicode Scalar Values in the range 0 to 65535).

Yes, exactly. That was my goal when choosing this.
(But, as I said, my suggestion wrt. handling of the encoding
is not limited to 16 bit; it’s perfectly possible to handle
full 21-bit Unicode/ISO-10646 with it, just my code was only
written to support 16-bit BMP.)

>It's fundamentally impossible in the C language to support UTF-8 with
>the full Unicode range as a locale's multibyte encoding when wchar_t
>is 16-bit

ACK. No complaints there. This is outside of the scope of MirBSD.
I will not use UTF-16 there, either.

>"support" the full Unicode range using UTF-16 for wchar_t and CESU-8
>for the multibyte encoding

Yikes, no!

>> This just means that your C locale cannot be strictly UTF-8. All
>> others can, but the C locale is precisely for this. This is because
>> the C locale is special like that.
>
>It's not special like that in any current or past issue of the
>standard, but the proposal here is to change it so it is special like
>that. I object to this change.

It’s not explicit yet, but ⓐ implied already (otherwise they would
not announce it like that, and IMHO it’s a non-change) and ⓑ common
current and expected behaviour, and, as such, sensible to require.

Plus, I showed you how it can be done.

bye,
//mirabilos
-- 
13:37⎜«Natureshadow» Deep inside, I hate mirabilos. I mean, he's a good
guy. But he's always right! In every fsckin' situation, he's right. Even
with his deeply perverted taste in software and borked ambition towards
broken OSes - in the end, he's damn right about it :(! […] works in mksh

[dev] Re: 8-bit transparency in the C locale vs. UTF-8 support

Reply via email to