Hi, From: Tomohiro KUBOTA <[EMAIL PROTECTED]> Subject: Re: lists.debian.org de-localization Date: Tue, 07 Jan 2003 21:45:05 +0900 (JST)
> I think more important problem is how to deal with raw 8bit mail > headers without encoding specification or encodings which are not > supported by the current set-up but used in Debian mailing lists > (GB2312, BIG5, and KOI8-R). I heard that the current development version of MHonArc has a feature to assume raw 8bit characters as some specified encoding . However, I don't think this can be a solution now because it will take a very long time that the version will be stable, then the stable version will be adopted into unstable/testing version of Debian distribution, then the distribution will become stable (released), and then the stable distribution will be adopted to master.debian.org . Anyway, I can write a KOI8-R -> SGML entity (or "&#xxxx;" expression) filter very easily. My plan is to assume raw 8bit characters to be KOI8-R Russian and I think this can be achieved easily. Remained problem is: how to handle unsupported encodings such as GB2312 and Big5. I found that the current set-up of lists.debian.org mhonarc converts GB2312 and Big5 into raw 8bit streams (or can be said 16bit streams because these encodings are multibyte) and they also cause encoding conflicts and loss of following "<" in "</em>". Thus I'd like these encodings to be converted into "&#xxxx;" expressions. (Also, debian-esperanto people may want to use ISO-8859-3 and UTF-8.) I found master.debian.org:/org/lists.debian.org/mhonarc/share/mhonarc/MHonArc/UTF8.pm but I don't think this will work well because it depends on Unicode::MapUTF8 module which is available as libunicode-maputf8-perl package since Woody, where master.debian.org is Potato. Then, I might be able to write an original filter using libtext-unicode-perl but the package is also available since Woody. I don't know any other ways. Any suggestions? --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/