Re: The IBM zEnterprise EC12 announcment

Paul Gilmartin Tue, 28 Aug 2012 09:18:32 -0700

On Tue, 28 Aug 2012 10:39:03 -0500, Kirk Wolf wrote:
>
>UTF-16 is used in Java (and other languages) as the internal representation
>of characters and strings (each character represented by two bytes).
>
No.  Not according to:


    http://en.wikipedia.org/wiki/UTF-16

    UTF-16 (16-bit Unicode Transformation Format) is a character encoding for
    Unicode capable of encoding 1,112,064[1] numbers (called code points) in the
    Unicode code space from 0 to 0x10FFFF. It produces a variable-length result
    of either one or two 16-bit code units per code point.

And:

    http://www.ietf.org/rfc/rfc2781.txt

   The rules for how characters are encoded in UTF-16 are:

   -  Characters with values less than 0x10000 are represented as a
      single 16-bit integer with a value equal to that of the character
      number.

   -  Characters with values between 0x10000 and 0x10FFFF are
      represented by a 16-bit integer with a value between 0xD800 and
      0xDBFF (within the so-called high-half zone or high surrogate
      area) followed by a 16-bit integer with a value between 0xDC00 and
      0xDFFF (within the so-called low-half zone or low surrogate area).

   -  Characters with values greater than 0x10FFFF cannot be encoded in
      UTF-16.

-- gil

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Re: The IBM zEnterprise EC12 announcment

Reply via email to