On Tue, 28 Aug 2012 10:39:03 -0500, Kirk Wolf wrote:
>
>UTF-16 is used in Java (and other languages) as the internal representation
>of characters and strings (each character represented by two bytes).
>
No. Not according to:
http://en.wikipedia.org/wiki/UTF-16
UTF-16 (16-bit Unicode Transformation Format) is a character encoding for
Unicode capable of encoding 1,112,064[1] numbers (called code points) in the
Unicode code space from 0 to 0x10FFFF. It produces a variable-length result
of either one or two 16-bit code units per code point.
And:
http://www.ietf.org/rfc/rfc2781.txt
The rules for how characters are encoded in UTF-16 are:
- Characters with values less than 0x10000 are represented as a
single 16-bit integer with a value equal to that of the character
number.
- Characters with values between 0x10000 and 0x10FFFF are
represented by a 16-bit integer with a value between 0xD800 and
0xDBFF (within the so-called high-half zone or high surrogate
area) followed by a 16-bit integer with a value between 0xDC00 and
0xDFFF (within the so-called low-half zone or low surrogate area).
- Characters with values greater than 0x10FFFF cannot be encoded in
UTF-16.
-- gil
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN