Jim Gifford wrote:
(many thanks to Jim for contacting the authors)
Here is a response from the Man maintainer on UTF-8
Federico Lucifredi wrote:
Hello Jim,
Man will not be UTF-8 problematic soon - the timeframe is march. So,
hold on to your horses, we are working on it and it targeted to be in
release 1.6d.
Hope that will solve your problem -- entirely =)
Yes, this is in the TODO list in Man-1.6b. Please make sure, however,
that the maintainer fully understands the problem. And, unfortunately, I
can't verify the progress because I don't have read access to the
development sources of Man.
Currently, Man has the following problems:
1) It doesn't output error messages (such as translations of "no such
manual page") properly in UTF-8 locales. Currently, it just copies the
sequence of bytes supplied by the translator to the terminal. This is
correct only if the translator and the user use the same locale. Such
assumption was reasonable until year 2003, but it fails now, and leads
to just a sequence of "invalid character" squares if, e.g., the
translator uses ru_RU.KOI8-R and the user uses ru_RU.UTF-8. A sequence
of empty squares is not a helpful error message.
Man should either convert messages between translator's and user's
character set on the fly (this is best done by switching from the
obsolete catgets family of translation functions to gettext), or not
attempt to translate error messages from English at all. Oh, we can
always implement the second solution by passing "+lang none" to Man's
./configure line.
2) The language special-casing logic (strncmp(lang, "ja", 2) == 0)
doesn't match today's reality and should be either implemented properly
(as in man-db) or omitted completely and left to distro patches. With
today's groff and Debian's policy on storing manual pages on the
filesystem, the reality is as follows:
A) CVS version of Groff: see the bottom of this mail.
B) Released versions of Groff-1.19.x: In 8-bit locales, use -Tlatin1. In
UTF-8 locales which have a corresponding non-UTF-8 locale, use the
-Tlatin1 device and convert the output to the encoding in which the
manual page is stored on disk to UTF-8. In essentially-multibyte
languages (i.e., Chinese, Japanese, and Korean) there is no way to
format manual pages correctly with this version of groff.
C) Debian-patched Groff-1.18.1.1: same as Groff-1.19.x, but has special
rules for CJK: the manual page should be converted from the encoding in
which it is stored on disk to the locale charset, and then fed to groff
with the -Tnippon or -Tutf8 parameter depending on the locale character set.
3) Feature request: it would be very nice if LFS obtains a way to tell
Man to ignore /usr/share/man/ja/* even in Japanese locales, because the
system's Groff-1.19.{2,3cvs} can't format those manuals. This also
applies to other languages, and maybe it is better to implement as a
whitelist, not blacklist. This whitelist should be different for
printing and display purposes.
Here is a response from the Groff maintainer on UTF-8
Werner LEMBERG wrote:
Is the current CVS of groff, utf-8 friendly.
Yes. It doesn't have the final form (the preconv preprocessor will
get folded into soelim) which means that files included with .so
aren't handled yet automatically, but the interface won't change, this
is, options `-k' and `-K <enc>' will stay to convert the input file
encoding to something groff can understand.
Note that you still need fonts which actually have those Unicode
characters.
I have checked out today's Groff from Savannah CVS. The test results are
below.
1) It depends upon netpbm programs. This dependency can be circumvented
for LFS purposes by issuing "make -k" for compilation.
2) The relocation stuff segfaults, so I had to disable it by editing
src/libs/libgroff/Makefile.sub.
3) After that, groff can correctly format the Russian manual page for
the /etc/passwd file in both ru_RU.KOI8-R and ru_RU.UTF-8 with the
following command:
groff -K KOI8-R -Tutf8 -mandoc /usr/share/man/ru/man5/passwd.5 | iconv
-f UTF-8 -t //TRANSLIT
This, of course, scales to more (but not all) languages by changing
KOI8-R to the character set in which the manual pages for that language.
Also, if one stores manual pages on disk in RedHat fashion, this works
if KOI8-R is changed to UTF-8. So, the new architecture is good and
general. Thanks to the authors.
The old pre-1.19.3cvs method (used by man-db) still works:
groff -Tlatin1 -mandoc /usr/share/man/ru/man5/passwd.5 | iconv -f KOI8-R
4) The "-k" and encoding autoguessing is a bad idea because not every
manual page is tagged properly (e.g., the passwd(5) manual page is not
tagged). Everyone will end up using -K with the explicit encoding
specified (and, in fact, that's Man's, not user's responsibility).
5) New Groff is still not able to format Japanese manuals. Is there any
timeline for this?
So, since the set of languages supported by new groff is a strict subset
of those supported by Debian-patched groff-1.18.1.1 and Man-DB, there is
no direct merit in upgrading now or in March. This does not, however,
make testing of new versions of Man and Groff irrelevant.
Both also expressed and interest for us to assist in testing.
Thanks to both of them. With their help, Man-DB will be certainly not
needed in the future.
Jim: If you really want to drop Man-DB right now in favour of Man, I
will (on your request) make a proof-of-concept patch for the current LFS
book that installs Man and a safe subset of manual pages without
confusing instructions. But that would be a huge functionality drop (but
no "unreadable manual page" bugs similar to those found in RedHat 8), so
I really don't want this to be applied.
--
Alexander E. Patrakov
--
http://linuxfromscratch.org/mailman/listinfo/lfs-dev
FAQ: http://www.linuxfromscratch.org/faq/
Unsubscribe: See the above information page