Hi, At Wed, 11 Sep 2002 02:58:59 -0400, Glenn Maynard wrote:
> http://www.debian.or.jp/~kubota/unicode-symbols-unihan.html: I am the writer of the above document. I think this problem cannot be solved anyway. My makeshift solution is to use Japanese glyph set, because Chinese and Korean people seem to be more tolerant on glyph difference, while Japanese people tend to stick to it. However, this problem is related to *displaying system*, not to the way how to store text in files or memory. However, there are more complex and important problems. http://www.debian.or.jp/~kubota/unicode-symbols-map2.html This is the round-trip conversion problem. It EXISTS, but it is not simple to determine which part is BUG. I sent mails to Unicode Consortium to solve this problem but I think they don't have enough political power to solve this.... Yes, huge POLITICAL POWER is needed to solve this problem, and, of course I don't have. If we never think about systems other than Debian or Linux, we can avoid thinking about the problem I wrote in the document, because the problem is related to incompatible mapping tables between vendors. However, if we think about mapping like: UTF-8 ---(mapping using Windows)--> EUC-JP ---(mapping using Linux)--> UTF-8 , we will suffer the problem. ja.po files are usually written in EUC-JP. It is just because EUC-JP is the most popular encoding for Japanese Linux environment, including ja.po writers' environments. I think ja.po in UTF-8 is completely OK, but the writer should be careful not to use UTF-8 characters which cannot be mapped to EUC-JP, because most users use EUC-JP. (Note that unmappable characters depend on the mapping table and can be affected by the above mapping table problem.) http://www.debian.or.jp/~kubota/unicode-symbols-width2.html Another problem is the character width. You know, most of CJK characters are doublewidth, which means one character occupies two columns in console. The rule is very simple -- characters from ASCII and JIS X 0201 are singlewidth and characters from JIS X 0208 and JIS X 0212 are doublewidth. (EUC-JP encoding uses these coded character sets.) However, this simple rule is valid only in EUC-JP. How UTF-8-based terminals should behave? There are many characters (mainly symbols, and Cyrillics and Greeks) which are classified into EastAsianAmbigious in Unicode Standard Annex #11. The most problematic characters are ruler elements, I think. However, ruler elements are not often used in Debconf. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/ "Introduction to I18N" http://www.debian.org/doc/manuals/intro-i18n/ -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]