Gil is 100% correct. And the assertion that the battle is over and UTF-8 has won is not my "opinion." I don't have a dog in this fight. The world can go to 5-bit Baudot for all I care. It's simply a fact: http://w3techs.com/technologies/overview/character_encoding/all .
Charles -----Original Message----- From: IBM Mainframe Discussion List [mailto:[email protected]] On Behalf Of Paul Gilmartin Sent: Friday, January 10, 2014 8:32 AM To: [email protected] Subject: Re: Subject Unicode On Fri, 10 Jan 2014 11:02:57 -0500, John Gilmore wrote: >Charles > >I do not think you read my post at all carefully. > >I made it clear that for specific language pairs UTF-8 is adequate if >often clumsy. > >For multiple-language environments it is equally clear that it is inadequate. > >It is of course true that any grapheme, even say some company's logo or >an astrological house, can be represented in UTF-8. The problem is not >one of representability but of subset choice. The decision to include >one may preclude the inclusion of another. Some subsets of at most 256 >characters are adequate to some particular tasks and others are >adequate to other particular tasks. None is adequate to all such >tasks. > Do you accept that: o UTF-8 is a variable length encoding scheme? o UTF-8 has representations for all the million plus Unicode characters? o The UTF-8 representation of any character is invariant with respect to any choice of "specific language [pairs]"? Given these premises (which I accept) it does not occur that '[t]he decision to include one [grapheme] may preclude the inclusion of another." There is no "problem [...] of subset choice." ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN
