[Thaddeus H. Black] > Would Peter permit me a mild dissent? I prefer Latin-1.
Dissents are fine. (: The reason to go with UTF-8 is for consistency. Tools that wish to render text onto the screen ought to be able to depend on knowing the encoding that text is in. See below for why I (and many others) think UTF-8 is the right choice for an encoding to standardize on. > I do not deny that Latin-1 represents all the languages I can read, > and that this fact may color my view. Nevertheless to me a source > written in Chinese is effectively non-free. It might as well be a > compiled binary blob. Consider packages intended for speakers of other languages: for example, an Urdu dictionary. The Description field would traditionally describe the package both in English and in Urdu (which uses the Arabic alphabet), and I think that's perfectly fine: the target audience can read its description more easily, and the rest of us can read the English. Now extrapolate to cases involving arbitrary languages, and this is possible only if the Description field uses an encoding of Unicode. (Well, one could invent an extra header to specify the character set, but that seems pointless in the extreme.) UTF-8 is by far the best encoding of Unicode for our purposes, since it was designed to be compatible with tools that parse ASCII. Other Unicode encodings have null bytes and other ASCII values embedded in non-ASCII characters. You can argue, and I would agree, that the Maintainer and Uploaders fields (the only fields other than Description where we are likely to see non-ASCII text) ought to be written in roman letters. People involved with Debian development are required to know a certain amount of English in any case, so the roman alphabet is a common denominator. And, unlike the Description field, it's awkward to try and have both native glyphs and a roman transliteration. However, I see no reason to tell Eastern Europeans that they cannot write their names natively; interpreting Eastern European diacritics is no harder for people who don't speak those languages than interpreting Western European diacritics for people who don't speak those. Peter
signature.asc
Description: Digital signature