David Given <[EMAIL PROTECTED]> writes: > Weeeell... unfortunately man-db uses ISO-8859-1 for C and POSIX locales, > so transcoding would be required.
You do get lintian warnings if you try to use ISO 8859-1 characters in man pages currently. Unfortunately, a lot of people just ignore those warnings. (One of the reasons for them is precisely to ease this transition.) > Further investigation reveals that man-db seems to transcode UTF-8 to > ISO-8859-1 before passing it to groff. Oh, so we lose if we have characters in UTF-8 that can't be represented in ISO 8859-1. Bleh. That explains why we are where we are. Thank you very much for the analysis! It hadn't occurred to me that man-db would be transcoding things on the way in, and now I understand much better what's going on. > It's all a bit of a maze, unfortunately, and I could have misunderstood > things. But that MULTIBYTE_GROFF #define looks interesting. It *might* > be possible to crudely hack it to work by using the nippon device and > the EUC-JP encoding for man pages written in UTF-8. I don't know what > the coverage of EUC-JP is like compared to UTF-8, but there might be > mileage there. Alternatively, ascii8 is supposed to be eight-bit clean, > and might suffice... I'm pretty sure that the MULTIBYTE_GROFF stuff is what didn't work quite right and what upstream wasn't entirely happy with. I think it was developed for some specific Asian encodings and works okay for them, but possibly not for arbitrary UTF-8. I wonder if that's what Red Hat uses or if they transcode as well and just lose on man pages that contain non-European characters. -- Russ Allbery ([EMAIL PROTECTED]) <http://www.eyrie.org/~eagle/> -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]