Re: Subject Unicode

Charles Mills Fri, 10 Jan 2014 09:24:40 -0800

Gil is 100% correct. 

And the assertion that the battle is over and UTF-8 has won is not my 
"opinion." I don't have a dog in this fight. The world can go to 5-bit Baudot 
for all I care. It's simply a fact: 
http://w3techs.com/technologies/overview/character_encoding/all .


Charles

-----Original Message-----
From: IBM Mainframe Discussion List [mailto:[email protected]] On Behalf 
Of Paul Gilmartin
Sent: Friday, January 10, 2014 8:32 AM
To: [email protected]
Subject: Re: Subject Unicode

On Fri, 10 Jan 2014 11:02:57 -0500, John Gilmore wrote:

>Charles
>
>I do not think you read my post at all carefully.
>
>I made it clear that for specific language pairs UTF-8 is adequate if 
>often clumsy.
>
>For multiple-language environments it is equally clear that it is inadequate.
>
>It is of course true that any grapheme, even say some company's logo or 
>an astrological house, can be represented in UTF-8.  The problem is not 
>one of representability but of subset choice.  The decision to include 
>one may preclude the inclusion of another.  Some subsets of at most 256 
>characters are adequate to some particular tasks and others are 
>adequate to other particular tasks.  None is adequate to all such 
>tasks.
>
Do you accept that:

o UTF-8 is a variable length encoding scheme?

o UTF-8 has representations for all the million plus Unicode characters?

o The UTF-8 representation of any character is invariant with respect
  to any choice of "specific language [pairs]"?

Given these premises (which I accept) it does not occur that '[t]he decision to 
include one [grapheme] may preclude the inclusion of another."  There is no 
"problem [...] of subset choice."

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Re: Subject Unicode

Reply via email to