Raul Miller <[EMAIL PROTECTED]> immo vero scripsit > On Wed, Jun 06, 2001 at 08:42:28PM +0900, Junichi Uekawa wrote: > > UCS4 is not a satisfactory encoding for our needs, unfortunately. > > JIS is not comlpete either, but UCS4 is less. > > Could you provide some examples of characters encoded in JIS but not > in UCS4? [a url would be fine, if it's hard to represent this in email.]
China-Japan-Korea Unified Ideographs is one that is causing the most pain. You could search for pages with the keyword "unified ideographs CJK" in google and lots of pages will be found. The main problem is that, the character information is not enough to represent what it is, and in practical terms, depending on the current chosen locale, the font used has to be changed. I.e. you need the "current language" information to decypher UCS4, like the "lang" tag in xml. Microsoft (one of the main culprit for the CJK unification process, I have heard) seems to have noticed the problem, and http://support.microsoft.com/support/kb/articles/Q170/5/59.ASP tells you the problem and lists a bunch of "unified ideograph"s which cannot be correctly handled. I am not sure if this has been resolved. regards, junichi -- [EMAIL PROTECTED] http://www.netfort.gr.jp/~dancer -----BEGIN GEEK CODE BLOCK----- Version: 3.12 GE d+ s:- a-- C+ UL++++ P- L+++ E W++ N o-- K- w++ O- M- V-- PS+ PE-- Y+ PGP+ t-- 5 X-- R* tv- b+ DI- D++ G e h* r% !y+ ------END GEEK CODE BLOCK------