Hi, From: Frank Lichtenheld <[EMAIL PROTECTED]> Subject: Bug#207455: acknowledged by developer (Re: Bug#207455: packages.debian.org: HTML-encodes multi-byte characters as single bytes) Date: Thu, 4 Sep 2003 18:46:36 +0200
> 1. Take them literaly and specify a charset=ascii for the page > 2. Dito, but charset=iso-8859-1 > 3. Dito, but charset=utf-8 > 4. Use one of the three charsets but make a list of broken descriptions > that have to be converted > > Currently we do (2) but I would prefer to go to (3). As long as policy > doesn't mandate one encoding for the description it's our decision > anyway and I would prefer to give everyone the same chance to break > something ;) Yes, ISO-8859-1 is a *local* character encoding which is useful only for a part of European-language speaking people. Currently, ASCII is the only character range which is common in the world. Though migration into UTF-8 is welcome, please note that U+0020 - U+007E will continue the only common character range for a while. If the policy will mandate usage of UTF-8, then the policy will have to note that the contents must be comprehensible even when being read in ASCII environment, i.e., even when non-ASCII characters are removed. Indeed, in multibyte locales which are popular in east Asia, an 8bit character (for example ISO-8859-1) will break not only the character itself but also the next character. Even though we have to be careful to use UTF-8, it is much better than the current situation that Debian is biased to a part of a world (i.e., ISO-8859-1 usage). Note that I think UTF-8 environment will not be popular until several basic features (like manpages) will be UTF-8-ready. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/