Re: UTF-8 woes on z/OS, a solution - comments invited

Paul Gilmartin Tue, 05 Sep 2017 06:54:30 -0700

On 2017-09-05, at 06:36, Pew, Curtis G wrote:
> 
> Unicode was originally supposed to be a fixed-width, 16-bit encoding. 
> Fixed-width was actually a design criteria for the original developers. It 
> was only after it became clear that there was no possible way to fit all the 
> needed characters into 16 bits that the “astral planes”[1] were (reluctantly) 
> added to Unicode and the various UTF encodings defined. In this light, UTF-16 
> is the closest thing to the original version of Unicode. Also, if your text 
> includes few or no Latin characters UTF-16 may be just as compact, or even 
> more compact, than UTF-8, and can probably be processed more easily.
>  
Are you confusing UTF-16 and UCS-2?
    https://en.wikipedia.org/wiki/UTF-16


    UTF-16 (16-bit Unicode Transformation Format) is a character encoding
    capable of encoding all 1,112,064 valid code points of Unicode. The
    encoding is variable-length, as code points are encoded with one or two
    16-bit code units. (also see Comparison of Unicode encodings for a
    comparison of UTF-8, -16 & -32)

    UTF-16 developed from an earlier fixed-width 16-bit encoding known as UCS-2
    (for 2-byte Universal Character Set) once it became clear that 16 bits were
    not sufficient for Unicode's user community.[1]

> Since Java was developed when Unicode was still supposed to be a 16-bit 
> encoding the early versions at least used what we would now call UTF-16. As I 
> recall, there was a significant period of time after Unicode abandoned a 
> fixed-width 16-bit representation before Java implementations really 
> supported characters from the “astral planes”.
> 
> 
> [1] Unicode is still organized into 64K ranges called “planes”. The original 
> 0–xFFFF range is called the “Basic Multilingual Plane” (BMP) and “astral 
> planes” is a convenient nickname for the other ranges.

-- gil

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Re: UTF-8 woes on z/OS, a solution - comments invited

Reply via email to