On Thu, Jun 01, 2006 at 02:41:27PM +0000, Wiktor Wandachowicz wrote: > Respectful Gentoo developers, > > I would like to ask what do you think about UTF-8 encoded manual pages? > I mean, the files like ls.1.gz, which are used by honorable "man" program. > Recently I attacked the problem a little and before submitting any > patches/proposals to Gentoo bugzilla I'd like to know your opinions first. > > Disclaimer: for daily use I have LANG="pl_PL.UTF-8" and LC_ALL="pl_PL.UTF-8", > but the original issue is of a more universal nature. > > Back on subject. ISO-8859-* 8-bit encodings are fine and most localized > manuals use them. However, there are some examples where UTF-8 manuals are > installed as well. Namely, newest portage uses "linguas_pl" by this means: > > $ emerge -pv portage > [ebuild R ] sys-apps/portage-2.1_rc3-r3 USE="-build -doc" LINGUAS="pl" > > In effect, a translated manual pages are added to the system. The problem > is that they use UTF-8 encoding. Having both man-pages-pl and this version > of portage installed gives unexpected results. This way "man ls" prints all > the letters with correct encoding, but "man emerge" does not. On the other > hand, if "man" is configured to display UTF-8 encoded manuals correctly, > all the other manuals print funny characters instead of desired output. > > I wrote a simple script [1] which checks all installed Polish manuals by > using "file" program. For "pl" locale it produces currently about ~70kB > of text, and for default locale it's about 458kB. After grepping for all > occurences of "UTF" I've found out that only the newest portage's manuals > are in UTF-8 ("pl"), plus: flow.1, gnome-keyring-manager.1, ImageMagick.1, > Encode::Unicode::UTF7.3pm (but I think they are false positives, anyway). > > While it's easy to contact Polish translators of the portage's manuals so > they could correct them, the problem will have to be solved sooner or later. > UTF-8 encoded manuals will probably occur with higher frequency, and some > general resolution should be made. > > After some discussion on the Polish forum [2] I've learnt about groff > deficiencies with UTF-8 handling. However, a wrapper exists [3] that helps > somewhat in that matter. But it also requires that all manuals be unified > wrt. encoding: *all* ISO-8859-* or *all* UTF-8, no compromise. > So I don't know what course to take. > > Summing up: > * UTF-8 manuals: good or bad?
Bad if they're the only option. It means manpages will no longer be available for non-UTF-8 users. Also, forcing everything in /usr/share/man/pl to be UTF-8 will require users to emerge -e world. > * how to handle mixed encodings of manuals? The same way it's done now: install latin2 pl manpages in /usr/share/man/pl and utf8 pl manpages in /usr/share/man/pl.UTF-8 If anything installs utf8 manpages in /usr/share/man/pl, fix the ebuild. > * should man and/or groff handle UTF-8 better? Yes, but it's not required to get this problem sorted out. > * should an eclass function be created to aid in correcting the encoding > of manual pages while installing them? Maybe, but it's not required to get this problem sorted out. -- gentoo-dev@gentoo.org mailing list