You are mistaken. The rules for encoding a longer UTF-8 character are well-defined. http://en.wikipedia.org/wiki/UTF-8#Description
Yes, it is a fact that for files with mostly Asian and similar characters UTF-8 is longer than UTF-16. Charles -----Original Message----- From: IBM Mainframe Discussion List [mailto:[email protected]] On Behalf Of John Gilmore Sent: Friday, January 10, 2014 10:28 AM To: [email protected] Subject: Re: Subject Unicode Paul, No, I do not accept the premises you set out. I will try, when I have more time, to make clear why with examples. Briefly, effective rules for encoding any 'character' recognized as a Unicode one as a 'longer' UTF-8 one do not in general exist. Moreover, even when they are available, my experience with them has been bad. In dealing recently with a document containing mixed English, German, Korean and Japanese text I found that the UTF-8 version was 23% longer than the UTF-16 version. ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN
