Hi, I found that the page of http://www.debian.org/devel/people.ja.html is very dirty. ALL characters are written in boldface (i.e., <strong> format) after some cirtain point.
This occurs because of 8bit (i.e., non-ASCII) characters in developers' names. When such characters (I guess most of them are intended to be ISO-8859-1) are used in developers' names, these characters appear in the webpage. In multibyte encodings, 8bit codepoints (0x80 - 0xff) are regarded as the first byte of multibyte characters. Then, the following byte is regarded as the second byte of THE multibyte character. Imagine such 8bit character is used at the last of a developer's name. The following character is "<" from "</strong>". Then the "<" of "</strong>" will be regarded as the second byte of multibyte character and "<" itself will be missing. Thus, "</strong>" will be "/strong>", a broken tag. This causes the webpage very dirty. Please watch the webpage by some browsers ... because of broken "</strong>", all following parts are displayed in <strong> format! I imagine the solution would be either of followings: 1. Regard all 8bit characters to be ISO-8859-1 and replace these characters with &foobar; expression. For example, 0xfc will replaced with "ü". The problem of this solution is that we have to assume 8bit characters to be ISO-8859-1. This means that this solution disturbs developers to switch from ISO-8859-1 into UTF-8, which is a very bad thing. 2. Force all developers to use ASCII or UTF-8 in their names and the script to generate people.name will assume all 8bit characters are UTF-8. All other encodings such as ISO-8859-1 or EUC-JP will be forbidden. The problem of this solution is that ISO-8859-1(15) people will complain. However, IMHO, this is an unfair priviledge of ISO-8859-1(15) people, and more, such an unequal situation disturbs promotion of i18n. Anyway, this will need a huge energy to persuade ISO-8859-1(15) people. 3. Though we don't force developers to switch to UTF-8, the script to generate people.name will regard all 8bit characters to be UTF-8. Since few 8bit characters are UTF-8 in developers' names so far, most of non-ASCII characters in people.html will be lost. (Anyway, all non-ASCII characters ARE now lost in people.<multibyte languages>.html pages). However, the broken-tag-problem will be solved. If develoers will switch into UTF-8, names of these developers will be displayed well. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/