On Thu, Jun 01, 2006 at 02:41:27PM +0000, Wiktor Wandachowicz wrote:
> Respectful Gentoo developers,
> 
> I would like to ask what do you think about UTF-8 encoded manual pages?
> I mean, the files like ls.1.gz, which are used by honorable "man" program.
> Recently I attacked the problem a little and before submitting any
> patches/proposals to Gentoo bugzilla I'd like to know your opinions first.
> 
> Disclaimer: for daily use I have LANG="pl_PL.UTF-8" and LC_ALL="pl_PL.UTF-8",
> but the original issue is of a more universal nature.
> 
> Back on subject. ISO-8859-* 8-bit encodings are fine and most localized
> manuals use them. However, there are some examples where UTF-8 manuals are
> installed as well. Namely, newest portage uses "linguas_pl" by this means:
> 
> $ emerge -pv portage
> [ebuild   R   ] sys-apps/portage-2.1_rc3-r3  USE="-build -doc" LINGUAS="pl"
> 
> In effect, a translated manual pages are added to the system. The problem
> is that they use UTF-8 encoding. Having both man-pages-pl and this version
> of portage installed gives unexpected results. This way "man ls" prints all
> the letters with correct encoding, but "man emerge" does not. On the other
> hand, if "man" is configured to display UTF-8 encoded manuals correctly,
> all the other manuals print funny characters instead of desired output.
> 
> I wrote a simple script [1] which checks all installed Polish manuals by
> using "file" program. For "pl" locale it produces currently about ~70kB
> of text, and for default locale it's about 458kB. After grepping for all
> occurences of "UTF" I've found out that only the newest portage's manuals
> are in UTF-8 ("pl"), plus: flow.1, gnome-keyring-manager.1, ImageMagick.1,
> Encode::Unicode::UTF7.3pm (but I think they are false positives, anyway).
> 
> While it's easy to contact Polish translators of the portage's manuals so
> they could correct them, the problem will have to be solved sooner or later.
> UTF-8 encoded manuals will probably occur with higher frequency, and some
> general resolution should be made.
> 
> After some discussion on the Polish forum [2] I've learnt about groff
> deficiencies with UTF-8 handling. However, a wrapper exists [3] that helps
> somewhat in that matter. But it also requires that all manuals be unified
> wrt. encoding: *all* ISO-8859-* or *all* UTF-8, no compromise.
> So I don't know what course to take.
> 
> Summing up:
> * UTF-8 manuals: good or bad?

Bad if they're the only option. It means manpages will no longer be
available for non-UTF-8 users. Also, forcing everything in
/usr/share/man/pl to be UTF-8 will require users to emerge -e world.

> * how to handle mixed encodings of manuals?

The same way it's done now: install latin2 pl manpages in
 /usr/share/man/pl
and utf8 pl manpages in
 /usr/share/man/pl.UTF-8
If anything installs utf8 manpages in /usr/share/man/pl, fix the ebuild.

> * should man and/or groff handle UTF-8 better?

Yes, but it's not required to get this problem sorted out.

> * should an eclass function be created to aid in correcting the encoding
>   of manual pages while installing them?

Maybe, but it's not required to get this problem sorted out.
-- 
gentoo-dev@gentoo.org mailing list

Reply via email to