On Thu, Oct 11, 2007 at 12:22:18PM +0100, Colin Watson wrote: > I have uploaded man-db 2.5.0-1, which includes the following changes of > note: [...] > > groff does not yet support UTF-8 input, so at the moment this is > implemented by recoding in man. For the time being, the implementation > requires that the page be convertible to the legacy encoding for the > language using iconv (it uses //TRANSLIT so that it will make an attempt > at characters that aren't directly convertible, but that isn't perfect); > so a German manual page should avoid using UTF-8 characters without an > equivalent in ISO-8859-1. I do not expect this to be particularly > onerous for the time being, though there are a few cases (particularly > proper names) where it may be a problem. I ask for your patience in > those cases. If you need to use a character not in the corresponding > legacy encoding, then I recommend using named character escapes as > documented in groff_char(7).
Actually, groff is _almost_ capable of supporting UTF-8. It understands it internally, and has problems just on input and output. For input, a minimal patch can be as simple as: --- src/libs/libgroff/encoding.cc (revision 6) +++ src/libs/libgroff/encoding.cc (revision 8) @@ -369,6 +369,9 @@ // groff 1 defines ISO-8859-1 as the input encoding, so this is required // for compatibility. groff 2 will define UTF-8 (or possibly officially // allow it to be switchable?) + select_input_encoding_handler("UTF-8"); + select_output_encoding_handler("UTF-8"); + return; select_input_encoding_handler("ISO-8859-1"); select_output_encoding_handler("C"); (no longer relevant special cases for CJK follow) and then instead of: source -[?]-manconv-[ISO-8859-1]-> groff -[ISO-8859-1]-iconv-[$LOCALE]-> less man-db could do: source -[?]-manconv-[UTF-8]-> groff -[UTF-8]-iconv-[$LOCALE]-> less Too bad, output is harder. By adjusting char widths (http://angband.pl/deb/man/groff-devutf8.diff) I've got terminal output working neatly for everything but arabic/hebrew (not a regression), but I have neither the time nor knowledge to fix PostScript and such. Yet, since the current groff supports only ISO-8859-? and CJK, I guess at least a no-regression change could be easy to do. -- 1KB // Microsoft corollary to Hanlon's razor: // Never attribute to stupidity what can be // adequately explained by malice. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]