Given the recent discussion about UTF-8 support in debian, I would like to come forth with following proposal. Any comments, suggestions, and grammar corrections are welcome
*Addition to section 3 Control files and their fields: 3.3 Default charset of control files If, for whatever reason (such as upstream author's or maintainer's names, foreign language package description and similar), you need to use characters outside 7 bit ASCII range in control files, these characters must be encoded using UTF-8 encoding. [Rationale: currently, there is no default charset for these fields. As a result, everybody uses his own national encoding. Doing a quick glance at /var/lib/packages/available, I noticed some ISO-8859-1 encoded characters, some words in KOI8-R, and an unknown japanese encoding.] *Addition to 5.3 debian/changelog: Character set of debian/changelog must be either pure ASCII, or UTF-8. [Explanation: it would be sufficient to mention UTF-8, since it is superset of ASCII, but we want to stay logically consistent with the next addition, if the rest of documentation is not UTF-8 (and just hope nobody will use EBDIC :-))] *Addition to 13.3 Additional documentation: Documentation of debian packages in text format, if written in language requiring characters outside of 7-bit ASCII range, should use either well-established encoding for the given language (such as ISO-8859-2 for some central- and easter europe languages, KOI8-R for Russian etc...), or UTF-8 encoding. Maintainers are being encouraged to use UTF-8, having in mind the general debian migration toward unified character encoding. Original upstream documentation, if in encoding other than UTF-8 or the well-established encoding for the particular language, should be converted either to UTF-8 or to the well-established encoding. Choice between UTF-8 and other encoding is left to the maintainer discretion, however, one package should have all the documentation in one consistent encoding. *Addition to 13.5 Preferred documentation formats: HTML documents, if in encoding other than us-ascii, must have in their header an appropriate META tag describing the used encoding. [example: <META HTTP-Equiv="Content-Type" CONTENT="text/html; charset=iso-8859-2"> ] *No good section to put this into, perhaps 13.9? Names of maintainers, upstream authors and other data in packages' descriptions and related data files (such as debian/changelog, debian/copyright, debian/control), as well as in English language documentation, should be either transliterated or transcribed to ASCII, or used in UTF-8 encoding at the discretion of the maintainer. However, for names in scripts based on non-latin alphabets, ASCII (or suitable latin-script) version should be provided along with original name. -- ----------------------------------------------------------- | Radovan Garabik http://melkor.dnp.fmph.uniba.sk/~garabik/ | | __..--^^^--..__ garabik @ melkor.dnp.fmph.uniba.sk | ----------------------------------------------------------- Antivirus alert: file .signature infected by signature virus. Hi! I'm a signature virus! Copy me into your signature file to help me spread!