On 23/07/17 13:39, Ralph Corderoy wrote:
>> What's the rationale for choosing UTF-16 in the first place?
> 
> History.  Microsoft plumped for UCS-2, both UCS-2BE and UCS-2LE
> I think.

That's not as I recall it: UCS-2, yes, but always UCS-2LE, (since their 
focus was on Intel x86 -- a little-endian memory organization, so only 
little-endian UCS-2 would be useful for representation of wchar_t).

> That's a fixed width;  two bytes per rune.  When that became
> insufficient, UTF-16 was a backwards-compatible upgrade AIUI.

Yep.  Yet another of Bill's "no one will ever need more than 640kB of 
memory" moments, IIRC: "16-bits should be sufficient to represent any 
character which anyone will ever want to display".  Of course, he was 
wrong on both counts, and when they realized that 16-bits wasn't going 
to be enough, they changed their definition of Unicode[*] to represent 
UTF-16LE, and added support for surrogate pairs to the APIs.

[*]: When Microsoft documentation refers to "Unicode", they invariably 
mean UTF-16LE; they seem reluctant to as much as acknowledge that any 
other variant exists; (there are a few rare, hard to find, instances 
where UTF-7 or UTF-8 are mentioned ... and then, usually to caution 
against using them).

-- 
Regards,
Keith.

Reply via email to