Hi, At Mon, 20 Nov 2000 01:11:02 -0700, Anthony Fok <[EMAIL PROTECTED]> wrote:
> To add to that list, China has the new GB18030-2000 standard > (locale zh_CN.GB18030) which also contains many characters beyond Unicode. Interesting. I will have to mention it in my "Introduction to I18N" document in Debian Documentation Project. (Now under grand rewriting). Please check http://www.debian.org/doc/manuals/intro-i18n/ BTW, I think GB18030 would be a _character set_, not _encoding_. If so, we won't have zh_CN.GB18030 locale. Examples (Japanese): JIS X 0201, JIS X 0208, JIS X 0212, JIS X 0213 are _character set_. EUC-JP, Shift-JIS, ISO-2022-JP are _encoding_. For simplified Chinese: GB 2312, GB 7589, GB 7590, GB 8565, GB 12052, GBK, are _character set_. CN-GB (aka EUC-CN), GBK, ISO-2022-CN, are _encoding_. For traditional Chinese: BIG5, CNS 11643, are _character set_. ISO-2022-CN, ISO-2022-CN-EXT, EUC-TW, BIG5, are _encoding. Codes which are not ISO2022-compliant tend not to separate _character set_ and _encoding_. > Very much so in Chinese. In fact, the Chinese government has gone as far as > to ban the sale of any Chinese software that only supports Unicode starting > in 2001. All new Chinese software must support the GB18030-2000 character > set. And yes, Microsoft will have to comply too; their current Unicode-only > solution won't work. (Ho ho ho!) Apparently, the Chinese government is > somewhat displeased to have the Chinese language controlled and *limited* > by an International Consortium like Unicode. There are *so* many Chinese > characters that aren't in the 16-bit Unicode that it would create lots of > trouble if Unicode were to become the de-facto standard in China. > GB18030-2000 is compatible with ISO-10646 AFAIK. How severe! Can a government have such a right? However, this sounds nice also for Japanese people. Softwares on POSIX systems will use locale and wide characters instead of Unicode and UTF-8, since this is the easiest way to support both of GB18030 and UTF-8. And UNIX vendors will work hard to support locale mechanisms. Then, usage of locale and wide characters concludes into support of encodings such as EUC-JP, ISO-2022-JP, Shift-JIS, and so on. I will be right, _if GB18030 won't included in Unicode_. However, I think GB18030 will be included in Unicode in future, if GB18030 is a character set, not an encoding. > Similar concerns are in Taiwan, and indeed many characters are only in > CNS11643 (and ISO-10646) but not in Unicode. > > Of course, these are mostly heresay. I don't know the details, as I was > originally from Hong Kong, and I have been living in Canada for over 10 > years. But speaking of Hong Kong, there are quite a few Chinese characters > added by the HKSAR government that won't be in Unicode either. So yeah, > though I am bemused, I am kind of glad that the Chinese government take such > a strong stance to force software support the new GB18030-2000 standard, > which, like ISO-10646, has space for millions of characters. :-) ISO-10646 and Unicode share exactly the same character set and will do also in future, though the width of code space is different (ISO-10646: 31bit, Unicode: 0x000000 - 0x10ffff [a bit more than 20bit] ). I suppose you misunderstand that Unicode is 16bit, though it is true that Unicode (1.0) _was_ 16bit. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://surfchem0.riken.go.jp/~kubota/