On Tue, Dec 07, 2004 at 05:56:54PM +0000, Thaddeus H. Black wrote: > > But yes, non-ASCII Latin-1 chars should not be given > > special status over the national chars found in other > > languages spoken by project members. Debian should be > > using either ASCII, or Unicode; standardizing on > > Latin-1 makes no sense in a global project.
> True. Look, Steve: mild abuse aside, I agree with you > in every particular. Nevertheless, I would respectfully > suggest that your criticism underscores my point, which > regards the monstrous increase in complexity which the > full Unicode standard represents. Yet you had concluded this means we should use Latin-1 as an encoding for the files. All arguments that justify the use of Latin-1 characters in the control file are equally applicable to any of a number of other national character sets used by one or more developers. > Consider. Is it a bug if Readline cannot echo full bidirectional input? Er, yes, sure it is, independently of what happens in debian/control. > If Dselect does not appreciate all the non-spacing > characters? IFF dselect has a reason to display such characters, yes. This may well be the case regardless of whether debian/control ever supports non-ASCII characters; Debian may start supporting localized Packages files via some external mechanism, or it may provide a localized UI that requires these characters. > If Less does not regard Tibetan subjoined letters? (This is my Tibetan > straw man.) Yes, this is also a bug. Not one that's likely to be noticed for a while, but a bug nevertheless. But your example again overstates the complexity of the task: the main responsibility of less is to figure out how many characters to display on a line, and let the *terminal* render the glyphs. This is code that needs to be implemented only once, and most of the work is already done centrally for *all* apps by glibc which keeps track of the display width of each character. > Undoubtedly one might observe that the Tibetan problem > were not really a problem with Less but rather with some > underlying library, but this misses the point---or > rather again it underscores the point. Unicode solves > what for many of us was not a problem by creating an > entirely new class of problems. For example, it > requires us to be particular about how we tag our e-mail > attachments... Um, no. Being part of a *global Internet* causes this problem for you. The non-ASCII characters in your email were undefined gibberish according to your headers; only naive (or "helpful", YMMV) mail readers would render them at all, and only naive mail readers commanded by users using a Western European locale would have rendered them as intended. Actually, perhaps even that is being too generous, as there are *different* native 8-bit encodings used on each of Unix, Windows, and MacOS; the Unix and Windows encodings differ on relatively few codepoints, but the Mac encoding is widely different. And you think it's ok to inflict this same mess on anyone not using a Latin-1 locale while trying to read a debian/control file? > Am I arguing to jettison Unicode? No; to the partial > extent that I had been arguing it earlier in the thread, > you, Peter, Daniel and Matthew have changed my mind. > However, the typical roster of skills one masters in > contributing broadly to Debian development is already > awesome: C, C++, CPP, Make, Perl, Python, Autoconf, CVS, > Shell, Glibc, System calls, /proc, IPC, sockets, Sed, > Awk, Vi, Emacs, locales, Libdb, GnuPG, Readline, > Ncurses, TeX, Postscript, Groff, XML, assembly, Flex, > Bison, ORB, Lisp, Dpkg, PAM, Xlibs, Tk, GTK, SysVInit, > Debconf, ELF, etc.---not to mention the use of the > English language at a sophisticated technical level. > UTF-8 is neat, but I do not really like Unicode (you may > have noticed this). Seeking essential simplicity, I > would prefer to keep the full hairy overgrown Unicode > standard from the typical Debian roster of development > skills. Wouldn't you? 1) Sorry, modern software is a complex creature. This is because we demand complex things of it -- including handling all the languages that we speak. 2) Most DDs do not master all of the above skills. *I* don't have a mastery of all of the above skills; "contributing broadly to Debian" usually means mastering some of these skills, and knowing where to find answers for the rest. 3) "Mastering Unicode", for the purposes of almost anyone not working directly on glibc or implementing a terminal, is roughly equivalent to "making sure your application implements proper string handling for CJK". If you do it right, the differences between UTF-8 and ISO-2022 are normally minimal; if you do it wrong, you get bug reports from Japanese users. However, for files for which no encoding is specified, there is no right way to handle non-ASCII data, which is why debian/control is an issue. 4) As suggested above, for 98% of all applications on the system, the encoding used for debian/control is *entirely irrelevant* to the question of whether they will need to support UTF-8. UTF-8 is already out there and in use, for very good (though hardly universal) reasons. Applications that don't handle UTF-8 get bug reports from all *kinds* of users. -- Steve Langasek postmodern programmer
signature.asc
Description: Digital signature