On Thu, Jun 01, 2017 at 11:05:01AM -0700, Konrad Schroder wrote: > Is there any particular reason not to implement the requirements for > __STDC_ISO_10646__, that is, to use Unicode UCS for wchar_t? Right now we > use a locale-dependent encoding (and we are not alone in this).
Soda-san is against it :) I was kind of agreeing with the idea of locale specific encodings 10 years ago, but I've come to the conclusion that the price doesn't justify the gains: (1) The primary reference for data exchange is Unicode. Legacy character sets still exist and are deployed, but they are certainly exactly that -- legacy for compatibility with other (older) systems. (2) The far majority of all existing character sets can be easily converted to and from Unicode. (3) If individual input characters can't be faithfully roundtripped to Unicode and back, we can just as well assign them private data points. Transliteration is likely needed in this case anyway for purposes like iconv. (4) Giving up locale-dependent wchar_t would significantly simplify the code by allowing a full layer of abstraction to be removed as well as the associated redundancy of implementations. (5) It is nearly free for western character sets, decent in terms of code complexity for Shift-JIS, ISO 2022 and EUC. Big5 is a mess, but primarily because it needs a large translation table. I still believe the advantages outweight the price a lot. Joerg