Dongsheng Song <dongsheng.s...@gmail.com> writes: > 2009/2/18 Vern Sun <s5u...@gmail.com>: >> on 三, 2009-02-18 at 02:43 +0800, Anthony Wong wrote: >>> I suggest 1. to convert all existing Chinese WML files for the Debian >>> website >>> from Big5 to UTF-8 >>> >>> Any comments? >>> >> 如果全部转换成 UTF-8 格式可能会存在问题,假设有两个用户(一个简体,一个繁体)都 >> 贡献了一个翻译: >> >> % cat foo.tc >> 中國 >> >> % cat foo.sc >> 中国 >> >> % enca foo.sc foo.tc >> foo.sc: Universal transformation format 8 bits; UTF-8 >> foo.tc: Universal transformation format 8 bits; UTF-8 >> >> 把简体用户贡献的翻译从 UTF-8 转到 GB2312 是正常的 >> ~% iconv -f utf8 -t gb2312 foo.sc > foo.sc.gb >> >> 但是把繁体用户贡献的翻译从 UTF-8 转到 GB2312 是错误的 >> ~% iconv -f utf8 -t gb2312 foo.tc > foo.tc.gb >> iconv: illegal input sequence at position 3 >> >> 同理,把简体用户贡献的翻译从 UTF-8 转到 BIG5 也是错误的 >> ~% iconv -f utf8 -t big5 foo.sc > foo.sc.big >> iconv: illegal input sequence at position 3 >> >> ~% iconv -f utf8 -t big5 foo.tc > foo.tc.big >> > > 我不明白,为什么还死抱着 GB2312/Big5 不放手,直接使用 UTF-8 不好吗? > sc <=> tc 应该只转换内容,不应该多此一举的转换到过时的编码。 > > --- > Dongsheng Song
Vern Sun 的考虑是:如果使用 GB2312/BIG5 编码,是可以直接知道该文档编码是 简体或者繁体,进而知道是否需要先进行简繁转换再编码转换。而 UTF-8 本身是 两者都可以同时存在的。 不过,使用 UTF-8 应该仍然可以知道当前的汉字是简体还是繁体,而且可以省去编 码转换步骤,所以应该不成问题。 Moreover, when posting to international mailing lists, please prefer using English so that non-Chinese speaker can follow. To non-Chinese speakers: the above discussion is a concern about using UTF-8 might lead to premature encoding conversion where converting Traditional Chinese characters to GB2312 will result in failure, as UTF-8 can hold both character sets. This is a valid concern, but shouldn't be a problem if checking the character set is done before hand, and if possible, there will be no need to do encoding conversions as UTF-8 can handle both character sets uniformly. Regards, Deng Xiyue -- To UNSUBSCRIBE, email to debian-chinese-gb-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org