On Thu, 2003-01-02 at 13:57, Colin Walters wrote: > #99933 goes a lot farther than #174982.
I have a counter-proposal to #99933, which I have attached. I believe it fixes the problems I raised with your proposal, and should also cover some new areas (like filenames). I also hopefully fixed James' issue with the RFC link. This patch supplants the one in #174982. It is more ambitious than #174982, but still does not introduce any "must"s, only "should"s or weaker. Opinions?
--- policy.sgml 2003-01-01 21:59:26.000000000 -0500 +++ policy.sgml.new 2003-01-02 17:14:56.000000000 -0500 @@ -2258,10 +2258,8 @@ </p> <p> - The entire changelog must be encoded in the - <url id="http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc2279.html" name="UTF-8"> - encoding of - <url id="http://www.unicode.org/" name="Unicode">. + The entire changelog should be encoded UTF-8; see <ref + id="unicode"> for more information. </p> <sect1><heading>Defining alternative changelog formats</heading> @@ -4190,6 +4188,31 @@ <sect> <heading>Filesystem hierarchy</heading> + <sect1> + <heading>File Names</heading> + + <p> + Files included in Debian packages or created by maintainer + scripts must have names which are valid UTF-8. Since + UTF-8 is fully backwards compatible with ASCII, few + packages will encounter trouble with this. + </p> + + <p> + Programs should expect filenames in general (whether from + a Debian package or created by the user) to be encoded + with UTF-8, although it is recommended for programs to try + gracefully falling back to the current locale's encoding + if this fails. Programs included in Debian packages + should, when creating new files, encode their names in + UTF-8 by default. + </p> + + <p> + See <ref id="unicode"> for more information on Debian and + Unicode. + </p> + </sect1> <sect1> <heading>Filesystem Structure</heading> @@ -5414,6 +5437,32 @@ </p> </sect> + <sect id="unicode"> + <heading>Unicode</heading> + + <p> + Debian is moving towards + <url id="http://www.unicode.org/" name="Unicode">, + and specifically the <url id="http://www.ietf.org/rfc/rfc2279.txt" name="UTF-8"> + encoding of Unicode, for representation of character data. + Unicode is a universal character set, able to encode all the + world's languages. Using Unicode makes internationalization + much easier, since programs will have to deal with only one + character set, instead of many different incompatible + national variants. + </p> + + <p> + The UTF-8 encoding of Unicode is designed for Unix-like + systems such as Debian. It is fully backwards compatible + with US-ASCII, and is also safe for use in filenames, since + no ASCII character appears as part of a multibyte character. + It is highly recommended, although not yet required, for + programs included in Debian to support Unicode and + specifically UTF-8. + </p> + </sect> + <sect> <heading>Environment variables</heading> @@ -7647,6 +7696,42 @@ </p> <p> + All documentation included in a package should be encoded in + UTF-8 (see <ref id="unicode"> for more information). If + upstream documentation is in another character set, the data + should be converted during the package build process. + <footnote> + <p> + One good way to do this is to use <prgn>iconv</prgn>, like: +<example> + for file in ChangeLog doc/README doc/INSTALL; do + iconv -f ISO-8859-1 -t UTF-8 $file > $file.new && mv $file.new $file + done +</example> + </p> + </footnote> + </p> + + <p> + Documentation formats which include a standard means of + specifying the character set of the data (such as + XML's <tt>encoding</tt> tag), may at their option use + another character set, although UTF-8 is still preferred. + Additionally, it is recommended for document formats which + are capable of specifying the character set of their data, + and do not have a default (like HTML), to do so. + <footnote> + <p> + As an example, for HTML documents, the <tt>head</tt> + section should include a header like: +<example> + <META content='text/html; charset=UTF-8' http-equiv='Content-Type'/> +</example> + </p> + </footnote> + </p> + + <p> Other formats such as PostScript may be provided at the package maintainer's discretion. </p>