On Fri, Jun 06, 2003 at 06:37:11PM +0200, Bill Allombert wrote: > On Fri, Jun 06, 2003 at 01:17:00PM +0200, Jérôme Marant wrote: > > > I don't see all those (7|8)-bit-charset-using people requiring the > > > same...
> > Policy would mean all of them in the same charset, UTF-8 that is. > The issue call for two comments: > 1) Changelog are required to be written in english, so non 7bit > characters should be rare, and use of non latin-1 characters are > probably not a good idea. For example, writing the name of a > developer with japanese characters might cause problem to people > reading the changelog understanding who is referred to. This is > unfortunate. > 2) People write changelog with whatever locales they use for development. > Requiring them to use special tool for writing changelog would be a > pain. I don't know how far lintian can check for UTF-8 encoding. Of course, these comments give contradictory rationales. The one says that mandating UTF-8 is bad because people shouldn't use non-ASCII characters in changelogs; the other says that mandating UTF-8 is bad because it makes it harder for people to use non-ASCII characters in changelogs. I argue that the latter is a *good* thing; and where exceptions are permitted, they should be encoded using a common character set. Checking for non-UTF8 characters in a changelog is trivial. Dump the file through 'iconv -f utf-8 -t ucs-4', discard the output, and check the return value. If there are any characters in the stream which are invalid UTF-8 sequences, iconv will exit with an error code; and this will be the case for the vast majority of other character sets. -- Steve Langasek postmodern programmer
pgpowTkkl06ur.pgp
Description: PGP signature