Hi Seconded
Though I think that a charset must be given in HTML if it is not ASCII, or (if the DTD is actually given) latin1 respectively UTF-8. A quick search on w3c.org reveils: ### http://www.w3.org/International/O-HTML-charset.html # # The base character set or document character set of HTML 4.0 and # XML is ISO/IEC 10646 (aka. Unicode, the Universal Character # Set). This does not mean that all HTML and XML documents have # to be encoded in Unicode (and there would still be various # encodings to choose from, such as e.g. UTF-8 and UTF-16). But it # means that the logical model describing how HTML and XML are # processed is described in terms of the UCS. The most important # consequence is that numeric character references (&#dddd; and # &#xhhhh;) are interpreted as Unicode. # # HTML 2.0 defined that all characters in an HTML document are to # be interpreted relative to ISO 8859-1 (aka. ISO Latin 1), but # also announced that all future versions of HTML will use a # superset of that, viz. Unicode (or ISO 10646), which means that # some 34000 of the world's characters are available (provided the # software that reads/writes HTML is upgraded). This whole stuff is such confusing that IMO everything but pure ASCII must have a charset meta. ciao, 2ri Radovan Garabik schrieb: > --- policy.sgml-old Fri Jun 1 11:40:16 2001 > +++ policy.sgml Thu Jun 7 13:31:09 2001 > @@ -1653,6 +1653,15 @@ > > > </sect> > + > + <sect id="controlencoding"><heading>Encoding of control files</heading> > + <p> > + If, for whatever reason (such as upstream author's or > maintainer's > + names, foreign language package description and similar), you > need to > + use characters outside 7 bit ASCII range in control files, these > + characters should be encoded using UTF-8 encoding. > + </p> > + </sect> > </chapt> > > <chapt id="versions"><heading>Version numbering</heading> > @@ -2276,8 +2285,16 @@ > all. > </p> > </sect1> > + > + <sect1><heading>Character set of <tt>debian/changelog</tt></heading> > + > + <p> > + Character set of <tt>debian/changelog</tt> should be either pure > ASCII, or UTF-8. > + </p> > + </sect1> > </sect> > > + > <sect id="srcsubstvars"><heading><tt>debian/substvars</tt> > and variable substitutions </heading> > > @@ -7370,6 +7387,26 @@ > from <tt>/usr/share/doc/<var>package</var>/</tt>. > </p> > > + <p> > + Documentation of debian packages in text format, if written in > + language requiring characters outside of 7-bit ASCII range, > + should use either well-established encoding for the given > + language <footnote>such as ISO-8859-2 for some central- and easter > + europian languages, KOI8-R for Russian, etc.</footnote>, or UTF-8 > + encoding. > + Maintainers are being encouraged to use UTF-8, having in mind > + the general debian migration toward unified character encoding. > + </p> > + > + <p> > + Original upstream documentation, if in encoding other than UTF-8 > + or the well-established encoding for the particular language, > + should be converted either to UTF-8 or to the well-established > + encoding. Choice between UTF-8 and other encoding is left to the > + maintainer discretion, however, one package should have all the > + documentation in one consistent encoding for one language. > + </p> > + > </sect> > > <sect id="usrdoc"> > @@ -7440,6 +7477,18 @@ > Other formats such as PostScript may be provided at the > package maintainer's discretion. > </p> > + > + <p> > + HTML documents, if in encoding other than <tt>us-ascii</tt>, should > + have in their header an appropriate META tag describing > + the used encoding. > + > + Example: > + <example> > + <META HTTP-Equiv="Content-Type" CONTENT="text/html; > charset=UTF-8"> > + </example> > + </p> > + > </sect> > > <sect id="copyrightfile"> > @@ -7555,6 +7604,24 @@ > changelog, then the Debian changelog should still be called > <tt>changelog.Debian.gz</tt>.</p> > </sect> > + > + <sect id="charset"> > + <heading>Deafult character set</heading> > + > + <p> > + Names of maintainers, upstream authors and other data in > + packages' descriptions and related debian data files (such as > + <tt>debian/changelog</tt>, <tt>debian/copyright</tt>, > + <tt>debian/control</tt>), as well as in English language > + documentation, should be either transliterated or > + transcribed to ASCII, or used in UTF-8 encoding at the > + discretion of the maintainer. However, for names > + in scripts based on non-latin alphabets, ASCII (or suitable > + latin-script) version should be provided along with original > + name. > + </p> > + </sect> > + > </chapt> > > <appendix id="pkg-scope"> > > > > -- > To UNSUBSCRIBE, email to [EMAIL PROTECTED] > with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED] -- All constants are variables.