On Sat, 2008-07-05 at 18:40:38 -0700, Russ Allbery wrote: > Russ Allbery <[EMAIL PROTECTED]> writes: > > > The reason why I dropped the RFC reference is that there are multiple > > references to UTF-8 all over Policy these days and I don't really want > > to footnote all of them. I'm not sure the best way to handle this. > > Maybe we need some sort of introductory mention of UTF-8 somewhere? > > Here's a revised patch that adds a definition section, currently only > defining ASCII and UTF-8. I haven't included the iconv trick; I think > that the Developers Reference or Lintian are better places for that. But > I don't feel strongly that way and am willing to change my mind of people > disagree. > > diff --git a/policy.sgml b/policy.sgml > index 24c9072..219664d 100644 > --- a/policy.sgml > +++ b/policy.sgml > @@ -273,6 +273,32 @@ > </p> > </sect> > > + <sect id="definitions"> > + <heading>Definitions</heading> > + > + <p> > + The following terms are used in this Policy Manual: > + <taglist> > + <tag>ASCII</tag> > + <item> > + The character encoding specified by ANSI X3.4-1986 and its > + predecessor standards, referred to in MIME as US-ASCII, and > + corresponding to an encoding in eight bits per character of > + the first 128 <url id="http://www.unicode.org/" > + name="Unicode"> characters, with the eighth bit always zero. > + </item> > + <tag>UTF-8</tag> > + <item> > + The transformation format (sometimes called encoding) of > + <url id="http://www.unicode.org/" name="Unicode"> defined by > + <url id="http://www.rfc-editor.org/rfc/rfc3629.txt" > + name="RFC 3629">. UTF-8 has the useful property of having > + ASCII as a subset, so any text encoded in ASCII is trivially > + also valid UTF-8. > + </item> > + </taglist> > + </p> > + </sect> > </chapt> > > > @@ -1473,10 +1499,6 @@ > </p> > > <p> > - > - </p> > - > - <p> > The format of the <file>debian/changelog</file> allows the > package building tools to discover which version of the package > is being built and find out other release-specific information. > @@ -1582,6 +1604,10 @@ > </p> > > <p> > + The entire changelog must be encoded in UTF-8. > + </p> > + > + <p> > For more information on placement of the changelog files > within binary packages, please see <ref id="changelogs">. > </p> > @@ -9822,36 +9848,6 @@ install-info --quiet --remove > /usr/share/info/foobar.info > See <ref id="dpkgchangelog">. > </p> > > - <p> > - It is recommended that the entire changelog be encoded in the > - <url id="http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc2279.html" > name="UTF-8"> > - encoding of > - <url id="http://www.unicode.org/" > - name="Unicode">.<footnote> > - <p> > - I think it is fairly obvious that we need to > - eventually transition to UTF-8 for our package > - infrastructure; it is really the only sane char-set in > - an international environment. Now, we can't switch to > - using UTF-8 for package control fields and the like > - until dpkg has better support, but one thing we can > - start doing today is requesting that Debian changelogs > - are UTF-8 encoded. At some point in time, we can start > - requiring them to do so. > - </p> > - <p> > - Checking for non-UTF8 characters in a changelog is > - trivial. Dump the file through > - <example>iconv -f utf-8 -t ucs-4</example> > - discard the output, and check the return > - value. If there are any characters in the stream > - which are invalid UTF-8 sequences, iconv will exit > - with an error code; and this will be the case for the > - vast majority of other character sets. > - </p> > - </footnote> > - </p> > - > <sect2><heading>Defining alternative changelog formats > </heading>
Seconded. regards, guillem
signature.asc
Description: Digital signature