I have uploaded man-db 2.5.0-1, which includes the following changes of note:
o Per-locale directory handling has been improved. Directories such as "fr.UTF-8" may be used for occasions when it is appropriate to specify the character set but not the country, and so a full locale name is inconvenient. o There is a new "manconv" program which can try multiple possible encodings for a file, thus allowing UTF-8 manual pages to be installed in any directory even without an explicit encoding declaration. I would like to recommend that package maintainers, particularly maintainers of manpages-* packages, begin to install manual pages encoded in UTF-8, so that we can shake down the details before encapsulating this in the policy manual. My original plan was that UTF-8 manual pages should be installed in /usr/share/man/<language>.UTF-8/ (unless your language is Chinese or Portuguese, just use the language code, not the country code, so for example French manual pages would go in /usr/share/man/fr.UTF-8/). I had a long discussion with Adam Borowski on debian-mentors recently in which he persuaded me that it was both possible and worth it to implement compatibility with the scheme used by e.g. Red Hat, in which manual pages installed in unadorned directories such as /usr/share/man/fr/ are assumed to be in UTF-8. To avoid the obvious transitional nightmare, the "manconv" program mentioned above guesses the file encoding on the fly, so both UTF-8 and legacy encodings are permitted. For reasons that will be obvious to those familiar with the details of character encodings, it is usually only possible to guess between UTF-8 and a single other encoding this way, but that's good enough for us. This means that we now have a choice of putting UTF-8 manual pages in /usr/share/man/<language>/ or /usr/share/man/<language>.UTF-8/. Although Adam made a valiant effort to persuade me otherwise, I still favour the .UTF-8 suffix; it's explicit, and it means that if your man program doesn't support UTF-8 (xman and yelp probably don't, for instance) then you will get the English manual page rather than a pile of misencoded garbage. Whether you think this is desirable probably depends on your language; a misencoded French page would be mostly readable anyway, while a misencoded Japanese page is entirely unusable. However, I'm posting to debian-devel and debian-i18n about this to give people the opportunity to advocate the other position. At this point, neither choice will present major technical difficulties as far as man-db is concerned. I would like to ask that people consider the practicalities of other man implementations as well as pure aesthetic concerns. groff does not yet support UTF-8 input, so at the moment this is implemented by recoding in man. For the time being, the implementation requires that the page be convertible to the legacy encoding for the language using iconv (it uses //TRANSLIT so that it will make an attempt at characters that aren't directly convertible, but that isn't perfect); so a German manual page should avoid using UTF-8 characters without an equivalent in ISO-8859-1. I do not expect this to be particularly onerous for the time being, though there are a few cases (particularly proper names) where it may be a problem. I ask for your patience in those cases. If you need to use a character not in the corresponding legacy encoding, then I recommend using named character escapes as documented in groff_char(7). Once we have a consensus on install locations, dh_installman should IMO be changed to do the recoding automatically; to do this, it needs to be told the source encoding. Joey, what do you think is the best way to do this? Options that come to mind are: * --language=<ll>.ISO-8859-1 * --source-encoding=ISO-8859-1 * manpage:ISO-8859-1 on the command line or in debian/package.manpages It's worth noting that packages may well have manual pages in a number of languages with a variety of encodings, so I'm not sure how well a global --source-encoding option would work. Of course the other option would be for dh_installman to DWIM and guess the encoding in the same way man does. :-) The transition to UTF-8 would happen much faster if maintainers didn't have to specify the encoding by hand. If you'd like to take this approach I can add code to man-db as necessary to help out. Cheers, -- Colin Watson [EMAIL PROTECTED] -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]