Hi, From: Hideki Yamane <[EMAIL PROTECTED]> Subject: Bug#227273: www.debian.org: Japanese DDTP files are provided with EUC-JP endoding. Date: Sun, 25 Jan 2004 01:10:41 +0900
> * tag is OK. That says "content="text/html; charset=ISO-2022-JP"". > * It looks like contents is not valid ISO-2022-JP. I don't know why. > Frank, would you tell me the way how did you convert it from EUC-JP > to ISO-2022-JP ? I checked http://packages.debian.org/unstable/misc/language-env.ja.html and found that closing escape sequences are missing. ISO-2022-JP is a "stateful" encoding. It means that a string consists of escape sequences to determine the "state" and ordinary codes whose meaning (corresponding characters) depends on the "state". For example, <Japanese Hiragana A> is: 1B 24 42 24 22 1B 28 42 where 1B 24 42 (the starting three bytes) means "here starts JIS X 0208 Japanese", 24 22 (following two bytes) is Japanese Hiragana A and the following 1B 28 42 means "here starts ASCII". In Japanese state, 24 22 means Japanese Hiragana A while in ASCII state it means Dollar and Double Quatation. I said closing escape sequences are missing. This means the "here starts ASCII" part is missing. Thus, all of the following ASCII characters (including HTML tags) are regarded as Japanese and causes Mojibake. I don't know what algorithm is used for generating the page, so I have no idea the reason of this broken page. --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/