Package: debian-policy Severity: wishlist [CCs: debian-i18n and debian-doc for obvious reasons, and the debhelper maintainer since there's a dh_installman change mentioned in the transition plan further down.]
Recently I have encountered some confusion as to the proper encoding of manual pages (which is entirely understandable given that this subsystem is lagging somewhat behind the rest of the world in terms of UTF-8 support). As the man-db maintainer, I would like to clarify this in policy. Note that, while there are one or two instances of deviation which prompted this proposal, this documents current practice in that it is what has been implemented in man-db for some time and it is already followed by the vast majority of packages. I don't believe that I'm making large swathes of packages instantly buggy here; if they did not follow this policy, they would already be buggy in that pages would be displayed with visible encoding damage. Accordingly, I've tentatively used a "must" for the encoding rules. I'm prepared to back off to a "should" if consensus on the list is against me here. I have used the language "not yet recommended" regarding installation of UTF-8 manual pages. My intent here was not so much to normatively state that this is a bug as to discourage it for the time being. As I noted in a footnote, I do expect this to be supported properly in man-db 2.5.0, which I've been working on for a while now (and in earnest for about the last week). I thus propose the following amendment, generated against [EMAIL PROTECTED]/debian-policy--devel--3.7--base-0. I am seeking comments on and seconds for this proposal. --- orig/policy.sgml +++ mod/policy.sgml @@ -8450,6 +8450,39 @@ be present in the future. </footnote> </p> + + <p> + Manual pages that are installed under + <file>/usr/share/man/</file><var>ll</var>, where <var>ll</var> + is an ISO-639 language code, must be encoded with the usual + legacy (non-UTF-8) character set for that language, as shown + by: + <example compact="compact"> +egrep -v '\.|@|UTF-8' /usr/share/i18n/SUPPORTED + </example> + <footnote> + This is necessary because many packages have historically + included manual pages encoded thus, and changing the + encoding of the whole hierarchy would involve a difficult + transitional period. + </footnote> + Manual pages that are installed under + <file>/usr/share/man/</file><var>locale</var>, where + <var>locale</var> is a full locale name listed in + <file>/usr/share/i18n/SUPPORTED</file>, must be encoded with + the character set implied by that locale. + </p> + + <p> + At present, it is not generally possible to install a manual + page encoded in UTF-8 such that it will be used in all locales + for that language (for example, a page installed under + <file>/usr/share/man/fr_FR.UTF-8</file> will not be used in + the <tt>fr_BE.UTF-8</tt> locale). It is therefore not yet + recommended to install pages encoded in UTF-8, but rather to + continue using the legacy encoding.<footnote>This is expected + to change as of man-db 2.5.0.</footnote> + </p> </sect> <sect> It will perhaps be helpful if I describe my transition plan for getting manual pages into UTF-8. Contrary to what occasionally seems to be popular belief, a newer version of groff is not necessary here (which is just as well as repeated attempts to merge in the CJK patch have been exceedingly painful, though I still hold out hope to get it done eventually). man-db is capable of shoving in iconv pipes as necessary. 1. Status at time of writing: packages should use only /usr/share/man/<ll>/ (although some packages have anticipated an approximation of the transition plan; we ignore these for the moment as there is little point in changing them only to change them back later), and must use the legacy encoding for pages installed there. 2. man-db 2.5.0-1 uploaded, including support for installing pages in /usr/share/man/<ll>.<codeset>/ (e.g. /usr/share/man/fr.UTF-8). The basename of this directory is not typically a well-formed locale, but it is appropriate because it allows a clear specification of the hierarchy's encoding while applying to all countries using that language. 3. man-db 2.5.0-1 moves into testing. 4. Packages encouraged (via debian-devel-announce) to begin using /usr/share/man/<ll>.UTF-8/; installation in other hierarchies will not be necessary as man-db will recode as needed. Packages using these hierarchies will be encouraged to declare Conflicts: man-db (<< 2.5.0-1) (or will Breaks: be allowed by that point? is either one just overkill?). 5. Update dh_installman to recode manual pages to UTF-8 automatically and install them under /usr/share/man/<ll>.UTF-8/. Getting the Conflicts:/Breaks: in here might be difficult, plus I'm not sure I'm wild about creating several thousand more arcs in our dependency graph. Maybe it's better just to wait for a stable release before changing debhelper, and not worry too much about the Conflicts:/Breaks: as it's not like the whole system will break as a result. 6. Policy updated once this has been shaken down and confirmed to work properly. 7. Distant future: deprecate /usr/share/man/<ll>/. This will only be for consistency, so there's no need to rush. This shouldn't be too difficult from where I am now, and at the moment I see no obstacles to landing UTF-8 manual page support for lenny. Note that the implementation using iconv will mean that any characters used that are not recodable to the corresponding legacy encoding will be discarded; this is difficult to avoid without upgrading groff, but I don't anticipate it being a substantial problem. Likewise, we'll probably still be unable to handle Arabic and Indic scripts properly, and CJK will probably still be a massive hack; but it'll be an improvement. Thanks, -- Colin Watson [EMAIL PROTECTED]
signature.asc
Description: Digital signature