Sorry I couldn't reply to this earllier. On 12/10/2007, at 11:27 PM, Adam Borowski wrote:
On Thu, Oct 11, 2007 at 12:22:18PM +0100, Colin Watson wrote:I have uploaded man-db 2.5.0-1, which includes the following changes ofnote:[...]groff does not yet support UTF-8 input, so at the moment this isimplemented by recoding in man. For the time being, the implementationrequires that the page be convertible to the legacy encoding for thelanguage using iconv (it uses //TRANSLIT so that it will make an attempt at characters that aren't directly convertible, but that isn't perfect); so a German manual page should avoid using UTF-8 characters without anequivalent in ISO-8859-1. I do not expect this to be particularlyonerous for the time being, though there are a few cases (particularlyproper names) where it may be a problem. I ask for your patience in those cases. If you need to use a character not in the corresponding legacy encoding, then I recommend using named character escapes as documented in groff_char(7).Actually, groff is _almost_ capable of supporting UTF-8. It understands it internally, and has problems just on input and output. For input, a minimalpatch can be as simple as: --- src/libs/libgroff/encoding.cc (revision 6) +++ src/libs/libgroff/encoding.cc (revision 8) @@ -369,6 +369,9 @@// groff 1 defines ISO-8859-1 as the input encoding, so this is required // for compatibility. groff 2 will define UTF-8 (or possibly officially// allow it to be switchable?) + select_input_encoding_handler("UTF-8"); + select_output_encoding_handler("UTF-8"); + return; select_input_encoding_handler("ISO-8859-1"); select_output_encoding_handler("C"); (no longer relevant special cases for CJK follow) and then instead of:source -[?]-manconv-[ISO-8859-1]-> groff -[ISO-8859-1]-iconv- [$LOCALE]-> lessman-db could do: source -[?]-manconv-[UTF-8]-> groff -[UTF-8]-iconv-[$LOCALE]-> less Too bad, output is harder. By adjusting char widths(http://angband.pl/deb/man/groff-devutf8.diff) I've got terminal output working neatly for everything but arabic/hebrew (not a regression), but Ihave neither the time nor knowledge to fix PostScript and such.Yet, since the current groff supports only ISO-8859-? and CJK, I guess atleast a no-regression change could be easy to do.
Can we package groff-utf8 [1] instead?Another note about Colin's original, and very well-thought-out post: I think Yelp _does_ support UTF-8. I'm pretty sure I tested my pilot Vietnamese manpage in it (as well as groff-utf8) a year or two back. There was quite a lot of discussion about UTF-8 display, on the GNOME i18n list then.
Thankyou for all your efforts to get UTF-8 manpages supported and encouraged. Manpages have notoriously lagged behind other translations in this regard. It's time they caught up.
from Clytie Vietnamese Free Software Translation Team http://vnoss.net/dokuwiki/doku.php?id=projects:l10n [1] http://www.haible.de/bruno/packages-groff-utf8.html
PGP.sig
Description: This is a digitally signed message part