Hi, From: Josip Rodin <[EMAIL PROTECTED]> Subject: Re: lists.debian.org de-localization Date: Sun, 12 Jan 2003 04:14:45 +0100
> This, on the other hand, is a hassle to handle (backporting or installation > into subdirs). master.d.o is scheduled to be upgraded to woody after samosa. > That's all I know. <shrug> This is a good news. Then I will work later on various encoding support. Anyway, I don't expect the new master.d.o will have development version of MHonArc (with encoding-assuming feature for raw 8bit headers) even if it comes from non-Debian-package version. Thus I think we will have to have some method to handle raw 8bit headers. Here is a "filter" to convert 8bit characters (assumed to be KOI8-R) to "&#xxxx;" expression, which I wrote by imitating iso8859.pl, CharEnt.pm, and UTF8.pm . This filter is used for raw 7bit/8bit strings. Since 7bit part of KOI8-R is identical to ASCII, it doesn't harm legal ASCII headers. The filter is to be installed into org/lists.debian.org/mhonarc/share/mhonarc/MHonArc/DEBIAN.pm and doesn't depend on the version of MHonArc or Debian.
## DEBIAN.pm by Tomohiro KUBOTA <[EMAIL PROTECTED]> ## ## CHARSETCONVERTER module that assume input string to be KOI8-R ## and convert it into &#xxx; expression where xxx is decimal Unicode ## codepoint. package DEBIAN; %US_ASCII_To_Ent = ( #-------------------------------------------------------------------------- # Hex Code Entity Ref # ISO external entity and description #-------------------------------------------------------------------------- 0x22, """, # ISOnum : Quotation mark 0x26, "&", # ISOnum : Ampersand 0x3C, "<", # ISOnum : Less-than sign 0x3E, ">", # ISOnum : Greater-than sign ); %KOI8_R_To_Ent = ( #-------------------------------------------------------------------------- # Hex Code Entity Ref # ISO external entity and description #-------------------------------------------------------------------------- 0x80, "─", # BOX DRAWINGS LIGHT HORIZONTAL 0x81, "│", # BOX DRAWINGS LIGHT VERTICAL 0x82, "┌", # BOX DRAWINGS LIGHT DOWN AND RIGHT 0x83, "┐", # BOX DRAWINGS LIGHT DOWN AND LEFT 0x84, "└", # BOX DRAWINGS LIGHT UP AND RIGHT 0x85, "┘", # BOX DRAWINGS LIGHT UP AND LEFT 0x86, "├", # BOX DRAWINGS LIGHT VERTICAL AND RIGHT 0x87, "┤", # BOX DRAWINGS LIGHT VERTICAL AND LEFT 0x88, "┬", # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL 0x89, "┴", # BOX DRAWINGS LIGHT UP AND HORIZONTAL 0x8a, "┼", # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL 0x8b, "▀", # UPPER HALF BLOCK 0x8c, "▄", # LOWER HALF BLOCK 0x8d, "█", # FULL BLOCK 0x8e, "▌", # LEFT HALF BLOCK 0x8f, "▐", # RIGHT HALF BLOCK 0x90, "░", # LIGHT SHADE 0x91, "▒", # MEDIUM SHADE 0x92, "▓", # DARK SHADE 0x93, "⌠", # TOP HALF INTEGRAL 0x94, "■", # BLACK SQUARE 0x95, "∙", # BULLET OPERATOR 0x96, "√", # SQUARE ROOT 0x97, "≈", # ALMOST EQUAL TO 0x98, "≤", # LESS-THAN OR EQUAL TO 0x99, "≥", # GREATER-THAN OR EQUAL TO 0x9a, " ", # NO-BREAK SPACE 0x9b, "⌡", # BOTTOM HALF INTEGRAL 0x9c, "°", # DEGREE SIGN 0x9d, "²", # SUPERSCRIPT TWO 0x9e, "·", # MIDDLE DOT 0x9f, "÷", # DIVISION SIGN 0xa0, "═", # BOX DRAWINGS DOUBLE HORIZONTAL 0xa1, "║", # BOX DRAWINGS DOUBLE VERTICAL 0xa2, "╒", # BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE 0xa3, "ё", # CYRILLIC SMALL LETTER IO 0xa4, "╓", # BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE 0xa5, "╔", # BOX DRAWINGS DOUBLE DOWN AND RIGHT 0xa6, "╕", # BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE 0xa7, "╖", # BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE 0xa8, "╗", # BOX DRAWINGS DOUBLE DOWN AND LEFT 0xa9, "╘", # BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE 0xaa, "╙", # BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE 0xab, "╚", # BOX DRAWINGS DOUBLE UP AND RIGHT 0xac, "╛", # BOX DRAWINGS UP SINGLE AND LEFT DOUBLE 0xad, "╜", # BOX DRAWINGS UP DOUBLE AND LEFT SINGLE 0xae, "╝", # BOX DRAWINGS DOUBLE UP AND LEFT 0xaf, "╞", # BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE 0xb0, "╟", # BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE 0xb1, "╠", # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT 0xb2, "╡", # BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE 0xb3, "Ё", # CYRILLIC CAPITAL LETTER IO 0xb4, "╢", # BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE 0xb5, "╣", # BOX DRAWINGS DOUBLE VERTICAL AND LEFT 0xb6, "╤", # BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE 0xb7, "╥", # BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE 0xb8, "╦", # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL 0xb9, "╧", # BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE 0xba, "╨", # BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE 0xbb, "╩", # BOX DRAWINGS DOUBLE UP AND HORIZONTAL 0xbc, "╪", # BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE 0xbd, "╫", # BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE 0xbe, "╬", # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL 0xbf, "©", # COPYRIGHT SIGN 0xc0, "ю", # CYRILLIC SMALL LETTER YU 0xc1, "а", # CYRILLIC SMALL LETTER A 0xc2, "б", # CYRILLIC SMALL LETTER BE 0xc3, "ц", # CYRILLIC SMALL LETTER TSE 0xc4, "д", # CYRILLIC SMALL LETTER DE 0xc5, "е", # CYRILLIC SMALL LETTER IE 0xc6, "ф", # CYRILLIC SMALL LETTER EF 0xc7, "г", # CYRILLIC SMALL LETTER GHE 0xc8, "х", # CYRILLIC SMALL LETTER HA 0xc9, "и", # CYRILLIC SMALL LETTER I 0xca, "й", # CYRILLIC SMALL LETTER SHORT I 0xcb, "к", # CYRILLIC SMALL LETTER KA 0xcc, "л", # CYRILLIC SMALL LETTER EL 0xcd, "м", # CYRILLIC SMALL LETTER EM 0xce, "н", # CYRILLIC SMALL LETTER EN 0xcf, "о", # CYRILLIC SMALL LETTER O 0xd0, "п", # CYRILLIC SMALL LETTER PE 0xd1, "я", # CYRILLIC SMALL LETTER YA 0xd2, "р", # CYRILLIC SMALL LETTER ER 0xd3, "с", # CYRILLIC SMALL LETTER ES 0xd4, "т", # CYRILLIC SMALL LETTER TE 0xd5, "у", # CYRILLIC SMALL LETTER U 0xd6, "ж", # CYRILLIC SMALL LETTER ZHE 0xd7, "в", # CYRILLIC SMALL LETTER VE 0xd8, "ь", # CYRILLIC SMALL LETTER SOFT SIGN 0xd9, "ы", # CYRILLIC SMALL LETTER YERU 0xda, "з", # CYRILLIC SMALL LETTER ZE 0xdb, "ш", # CYRILLIC SMALL LETTER SHA 0xdc, "э", # CYRILLIC SMALL LETTER E 0xdd, "щ", # CYRILLIC SMALL LETTER SHCHA 0xde, "ч", # CYRILLIC SMALL LETTER CHE 0xdf, "ъ", # CYRILLIC SMALL LETTER HARD SIGN 0xe0, "Ю", # CYRILLIC CAPITAL LETTER YU 0xe1, "А", # CYRILLIC CAPITAL LETTER A 0xe2, "Б", # CYRILLIC CAPITAL LETTER BE 0xe3, "Ц", # CYRILLIC CAPITAL LETTER TSE 0xe4, "Д", # CYRILLIC CAPITAL LETTER DE 0xe5, "Е", # CYRILLIC CAPITAL LETTER IE 0xe6, "Ф", # CYRILLIC CAPITAL LETTER EF 0xe7, "Г", # CYRILLIC CAPITAL LETTER GHE 0xe8, "Х", # CYRILLIC CAPITAL LETTER HA 0xe9, "И", # CYRILLIC CAPITAL LETTER I 0xea, "Й", # CYRILLIC CAPITAL LETTER SHORT I 0xeb, "К", # CYRILLIC CAPITAL LETTER KA 0xec, "Л", # CYRILLIC CAPITAL LETTER EL 0xed, "М", # CYRILLIC CAPITAL LETTER EM 0xee, "Н", # CYRILLIC CAPITAL LETTER EN 0xef, "О", # CYRILLIC CAPITAL LETTER O 0xf0, "П", # CYRILLIC CAPITAL LETTER PE 0xf1, "Я", # CYRILLIC CAPITAL LETTER YA 0xf2, "Р", # CYRILLIC CAPITAL LETTER ER 0xf3, "С", # CYRILLIC CAPITAL LETTER ES 0xf4, "Т", # CYRILLIC CAPITAL LETTER TE 0xf5, "У", # CYRILLIC CAPITAL LETTER U 0xf6, "Ж", # CYRILLIC CAPITAL LETTER ZHE 0xf7, "В", # CYRILLIC CAPITAL LETTER VE 0xf8, "Ь", # CYRILLIC CAPITAL LETTER SOFT SIGN 0xf9, "Ы", # CYRILLIC CAPITAL LETTER YERU 0xfa, "З", # CYRILLIC CAPITAL LETTER ZE 0xfb, "Ш", # CYRILLIC CAPITAL LETTER SHA 0xfc, "Э", # CYRILLIC CAPITAL LETTER E 0xfd, "Щ", # CYRILLIC CAPITAL LETTER SHCHA 0xfe, "Ч", # CYRILLIC CAPITAL LETTER CHE 0xff, "Ъ", # CYRILLIC CAPITAL LETTER HARD SIGN ); sub koi8r2sgml { my $data = $_[0]; my ($len, $ret, $char, $offset); $len = length($data); $ret = ""; $offset = 0; while ($offset < $len) { $char = unpack("C", substr($data, $offset++, 1)); if ($char < 128) { $ret .= ($US_ASCII_To_Ent{$char} || pack("C", $char)); } else { $ret .= ($KOI8_R_To_Ent{$char} || pack("C", $char)); } } $ret; } 1;
--- debian.rc 2003-01-12 12:33:02.000000000 +0900 +++ debian.rc.new 2003-01-12 12:35:43.000000000 +0900 @@ -3,7 +3,7 @@ <!-- Common Resources --------------------------------------------------------> <CharsetConverters> -plain; mhonarc::htmlize; +plain; MHonArc::DEBIAN::koi8r2sgml; MHonArc/DEBIAN.pm us-ascii; mhonarc::htmlize; iso-8859-1; iso_8859::str2sgml; iso8859.pl iso-8859-2; iso_8859::str2sgml; iso8859.pl