Bug#99324: Default charset should be UTF-8

Junichi Uekawa Tue, 12 Jun 2001 21:05:42 -0500

Raul Miller <[EMAIL PROTECTED]> immo vero scripsit

> On Wed, Jun 06, 2001 at 08:42:28PM +0900, Junichi Uekawa wrote:
> > UCS4 is not a satisfactory encoding for our needs, unfortunately.
> > JIS is not comlpete either, but UCS4 is less.
> 
> Could you provide some examples of characters encoded in JIS but not
> in UCS4?  [a url would be fine, if it's hard to represent this in email.]


China-Japan-Korea Unified Ideographs
is one that is causing the most pain.

You could search for pages with the keyword "unified ideographs CJK"
in google and lots of pages will be found.

The main problem is that, the character information is not enough to 
represent what it is, and in practical terms, depending on the current chosen
locale, the font used has to be changed.


I.e. you need the "current language" information to decypher UCS4,
like the "lang" tag in xml.


Microsoft (one of the main culprit for the CJK unification process, I have
heard) seems to have noticed the problem, and 
http://support.microsoft.com/support/kb/articles/Q170/5/59.ASP
tells you the problem and lists a bunch of  "unified ideograph"s which
cannot be correctly handled.




I am not sure if this has been resolved.

regards,
        junichi

-- 
[EMAIL PROTECTED]  http://www.netfort.gr.jp/~dancer
-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GE d+ s:- a-- C+ UL++++ P- L+++ E W++ N o-- K- w++ 
O- M- V-- PS+ PE-- Y+ PGP+ t-- 5 X-- R* tv- b+ DI- D++ 
G e h* r% !y+ 
------END GEEK CODE BLOCK------

Bug#99324: Default charset should be UTF-8

Reply via email to