On 23/07/17 13:39, Ralph Corderoy wrote: >> What's the rationale for choosing UTF-16 in the first place? > > History. Microsoft plumped for UCS-2, both UCS-2BE and UCS-2LE > I think.
That's not as I recall it: UCS-2, yes, but always UCS-2LE, (since their focus was on Intel x86 -- a little-endian memory organization, so only little-endian UCS-2 would be useful for representation of wchar_t). > That's a fixed width; two bytes per rune. When that became > insufficient, UTF-16 was a backwards-compatible upgrade AIUI. Yep. Yet another of Bill's "no one will ever need more than 640kB of memory" moments, IIRC: "16-bits should be sufficient to represent any character which anyone will ever want to display". Of course, he was wrong on both counts, and when they realized that 16-bits wasn't going to be enough, they changed their definition of Unicode[*] to represent UTF-16LE, and added support for surrogate pairs to the APIs. [*]: When Microsoft documentation refers to "Unicode", they invariably mean UTF-16LE; they seem reluctant to as much as acknowledge that any other variant exists; (there are a few rare, hard to find, instances where UTF-7 or UTF-8 are mentioned ... and then, usually to caution against using them). -- Regards, Keith.