Here is the proposal with typos and mistakes fixed, with added paragraph about possible use of other encodings. I left out the requirenment to specify a font needed to view the documentation, since IMHO that is overcomplication and unnecessary.
--- policy.sgml-old Fri Jun 1 11:40:16 2001 +++ policy.sgml Thu Jun 7 13:31:09 2001 @@ -1653,6 +1653,15 @@ </sect> + + <sect id="controlencoding"><heading>Encoding of control files</heading> + <p> + If, for whatever reason (such as upstream author's or maintainer's + names, foreign language package description and similar), you need to + use characters outside 7 bit ASCII range in control files, these + characters should be encoded using UTF-8 encoding. + </p> + </sect> </chapt> <chapt id="versions"><heading>Version numbering</heading> @@ -2276,8 +2285,16 @@ all. </p> </sect1> + + <sect1><heading>Character set of <tt>debian/changelog</tt></heading> + + <p> + Character set of <tt>debian/changelog</tt> should be either pure ASCII, or UTF-8. + </p> + </sect1> </sect> + <sect id="srcsubstvars"><heading><tt>debian/substvars</tt> and variable substitutions </heading> @@ -7370,6 +7387,26 @@ from <tt>/usr/share/doc/<var>package</var>/</tt>. </p> + <p> + Documentation of debian packages in text format, if written in + language requiring characters outside of 7-bit ASCII range, + should use either well-established encoding for the given + language <footnote>such as ISO-8859-2 for some central- and eastern + europian languages, KOI8-R for Russian, etc.</footnote>, or UTF-8 + encoding. + Maintainers are being encouraged to use UTF-8, having in mind + the general debian migration toward unified character encoding. + </p> + + <p> + Original upstream documentation, if in encoding other than UTF-8 + or the well-established encoding for the particular language, + should be converted either to UTF-8 or to the well-established + encoding. Choice between UTF-8 and other encoding is left to the + maintainer's discretion, however, in a single package, all the + documents written in a particular language should share the same encoding. + </p> + + <p> + Package may (at the discretion of the maintainer) include documentation + files in other encodings, if they are present also in canonical encoding, + and if the encodings used are clearly marked. + </p> + </sect> <sect id="usrdoc"> @@ -7440,6 +7477,18 @@ Other formats such as PostScript may be provided at the package maintainer's discretion. </p> + + <p> + HTML documents, if in encoding other than <tt>us-ascii</tt>, should + have in their header an appropriate META tag describing + the used encoding. + + Example: + <example> + <META HTTP-Equiv="Content-Type" CONTENT="text/html; charset=UTF-8"> + </example> + </p> + </sect> <sect id="copyrightfile"> @@ -7555,6 +7604,24 @@ changelog, then the Debian changelog should still be called <tt>changelog.Debian.gz</tt>.</p> </sect> + + <sect id="charset"> + <heading>Deafult character set</heading> + + <p> + Names of maintainers, upstream authors and other data in + packages' descriptions and related debian data files (such as + <tt>debian/changelog</tt>, <tt>debian/copyright</tt>, + <tt>debian/control</tt>), as well as in English language + documentation, should be either transliterated or + transcribed to ASCII, or used in UTF-8 encoding at the + discretion of the maintainer. However, for names + in scripts based on non-latin alphabets, ASCII (or suitable + latin-script) version should be provided along with original + name. + </p> + </sect> + </chapt> <appendix id="pkg-scope">