Glenn Linderman wrote:
On approximately 1/31/2006 12:06 PM, came the following characters from
the keyboard of Linda W:
The NT-based registry uses 16-bit binary "blobs" (wchar_t) that
are not, *strictly*, interpretable as UTF-16, UCS2 or any standard
character set.
Could you elucidate this "*strictly*" comment, or provide a doc ref that
explains it further?
---
Sorry to beat a dead registry-interface, but I received an
email from someone within MS that points to documents where this is more
accurately described:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_8zzn.asp
"Whenever a function has a length parameter for a character string, the length
should be documented as a count of TCHAR values in the string. This refers to
bytes for Windows code page ("ANSI") versions of the function or 16-bit words
for Unicode versions."
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_90hf.asp
"The value returned by the lstrlen function is always based on normal character
width: 8 bits for Windows code pages, 16 bits for Unicode. This is often
referred to as a "count of characters", but that is not strictly correct,
because Windows code pages that use double-byte character sets have some "full
width" characters that are actually represented by two consecutive bytes; a
similar situation arises for surrogates in Unicode."
and
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_192r.asp
"Windows applications normally use Unicode UTF-16 to represent character data.
16 bits would allow direct representation of 65,536 unique characters, but this
Basic Multilingual Plane (BMP) is not nearly enough to cover all of the symbols
used in human languages: Unicode version 4.1 includes over 97,000 characters,
with over 70,000 characters for Chinese alone."
"Windows 2000 introduced support for basic input, output, and simple sorting of
supplementary characters. However, not all system components are compatible with
supplementary characters. Also, supplementary characters are not supported in
Windows 95/98/Me."
"... note that Windows versions prior to Windows XP disable supplementary
character support by default. (Windows XP and later systems enable supplementary
characters by default.) "