Bernd Eckenfels writes: > Afaik UTF8 is not able to encode 32bit unicode? I thought this is because > the "living" languages are all restricted to 16bit? Hmm... i might be wrong. > Does that mean Java does not support asian languages with its 16bit Unicode?
UTF-8 can be used encode UCS-4. > As I understand it, all living languages are contained in the "not-extended" > 16bit set. No? Not at all. Ideographic Extension Block B (which will be part of the upcoming Unicode 3.1/ISO 10646-2 release) contains Han characters that are used in Hong Kong, Taiwan, and other locales. For example, the Hong Kong Supplementary Character Set (HKSCS) adds several thousand characters to Big Five and Unicode. They define mappings to the Big 5 EUDC and the PUA of Unicode. Ideographic Extension Block A (added in Unicode 3.0) includes some of the HKSCS code points, but not all. So you end up with separate mapping tables for Unicode 2.x and 3.0 because they contain different PUA mappings. Once IEB-B is released all of HKSCS can be encoded in Unicode/ISO 10646 without resorting to the PUA. -tree -- Tom Emerson Basis Technology Corp. Zenkaku Language Hacker http://www.basistech.com "Beware the lollipop of mediocrity: lick it once and you suck forever"