Jim Gifford wrote:

(many thanks to Jim for contacting the authors)

Here is a response from the Man maintainer on UTF-8

Federico Lucifredi wrote:

Hello Jim,
 Man will not be UTF-8 problematic soon - the timeframe is march. So,
hold on to your horses, we are working on it and it targeted to be in
release 1.6d.

 Hope that will solve your problem -- entirely =)

Yes, this is in the TODO list in Man-1.6b. Please make sure, however, that the maintainer fully understands the problem. And, unfortunately, I can't verify the progress because I don't have read access to the development sources of Man.

Currently, Man has the following problems:

1) It doesn't output error messages (such as translations of "no such manual page") properly in UTF-8 locales. Currently, it just copies the sequence of bytes supplied by the translator to the terminal. This is correct only if the translator and the user use the same locale. Such assumption was reasonable until year 2003, but it fails now, and leads to just a sequence of "invalid character" squares if, e.g., the translator uses ru_RU.KOI8-R and the user uses ru_RU.UTF-8. A sequence of empty squares is not a helpful error message.

Man should either convert messages between translator's and user's character set on the fly (this is best done by switching from the obsolete catgets family of translation functions to gettext), or not attempt to translate error messages from English at all. Oh, we can always implement the second solution by passing "+lang none" to Man's ./configure line.

2) The language special-casing logic (strncmp(lang, "ja", 2) == 0) doesn't match today's reality and should be either implemented properly (as in man-db) or omitted completely and left to distro patches. With today's groff and Debian's policy on storing manual pages on the filesystem, the reality is as follows:

A) CVS version of Groff: see the bottom of this mail.
B) Released versions of Groff-1.19.x: In 8-bit locales, use -Tlatin1. In UTF-8 locales which have a corresponding non-UTF-8 locale, use the -Tlatin1 device and convert the output to the encoding in which the manual page is stored on disk to UTF-8. In essentially-multibyte languages (i.e., Chinese, Japanese, and Korean) there is no way to format manual pages correctly with this version of groff. C) Debian-patched Groff-1.18.1.1: same as Groff-1.19.x, but has special rules for CJK: the manual page should be converted from the encoding in which it is stored on disk to the locale charset, and then fed to groff with the -Tnippon or -Tutf8 parameter depending on the locale character set.

3) Feature request: it would be very nice if LFS obtains a way to tell Man to ignore /usr/share/man/ja/* even in Japanese locales, because the system's Groff-1.19.{2,3cvs} can't format those manuals. This also applies to other languages, and maybe it is better to implement as a whitelist, not blacklist. This whitelist should be different for printing and display purposes.

Here is a response from the Groff maintainer on UTF-8

Werner LEMBERG wrote:

Is the current CVS of groff, utf-8 friendly.


Yes.  It doesn't have the final form (the preconv preprocessor will
get folded into soelim) which means that files included with .so
aren't handled yet automatically, but the interface won't change, this
is, options `-k' and `-K <enc>' will stay to convert the input file
encoding to something groff can understand.

Note that you still need fonts which actually have those Unicode
characters.

I have checked out today's Groff from Savannah CVS. The test results are below.

1) It depends upon netpbm programs. This dependency can be circumvented for LFS purposes by issuing "make -k" for compilation.

2) The relocation stuff segfaults, so I had to disable it by editing src/libs/libgroff/Makefile.sub.

3) After that, groff can correctly format the Russian manual page for the /etc/passwd file in both ru_RU.KOI8-R and ru_RU.UTF-8 with the following command:

groff -K KOI8-R -Tutf8 -mandoc /usr/share/man/ru/man5/passwd.5 | iconv -f UTF-8 -t //TRANSLIT

This, of course, scales to more (but not all) languages by changing KOI8-R to the character set in which the manual pages for that language. Also, if one stores manual pages on disk in RedHat fashion, this works if KOI8-R is changed to UTF-8. So, the new architecture is good and general. Thanks to the authors.

The old pre-1.19.3cvs method (used by man-db) still works:

groff -Tlatin1 -mandoc /usr/share/man/ru/man5/passwd.5 | iconv -f KOI8-R

4) The "-k" and encoding autoguessing is a bad idea because not every manual page is tagged properly (e.g., the passwd(5) manual page is not tagged). Everyone will end up using -K with the explicit encoding specified (and, in fact, that's Man's, not user's responsibility).

5) New Groff is still not able to format Japanese manuals. Is there any timeline for this?

So, since the set of languages supported by new groff is a strict subset of those supported by Debian-patched groff-1.18.1.1 and Man-DB, there is no direct merit in upgrading now or in March. This does not, however, make testing of new versions of Man and Groff irrelevant.

Both also expressed and interest for us to assist in testing.

Thanks to both of them. With their help, Man-DB will be certainly not needed in the future.

Jim: If you really want to drop Man-DB right now in favour of Man, I will (on your request) make a proof-of-concept patch for the current LFS book that installs Man and a safe subset of manual pages without confusing instructions. But that would be a huge functionality drop (but no "unreadable manual page" bugs similar to those found in RedHat 8), so I really don't want this to be applied.

--
Alexander E. Patrakov
--
http://linuxfromscratch.org/mailman/listinfo/lfs-dev
FAQ: http://www.linuxfromscratch.org/faq/
Unsubscribe: See the above information page

Reply via email to