Hi Marcus, Yesterday at 5:56, Marcus Brinkmann wrote:
> At 21 Jan 2005 19:31:13 -0800, > Thomas Bushnell BSG wrote: >> >> Marcus Brinkmann <[EMAIL PROTECTED]> writes: >> >> > UTF-8 is an insanely complex standard, if you start to look down its >> > depths. >> >> UTF-8 is a complex standard. It is not insanely so. It is complex >> because it is representing a very complex problem. Now, UTF-8 is an extremely simple standard, but Unicode is not so :) Proper UTF-8 transformation functions usually take no more than a couple of dozen lines, and that's including error checking :) Or, I may be missing what UTF-8 standard you're talking about (RFC something :). > Oh, sure. The insanity starts if you talk about using "UTF-8" for > things like filenames without being very exact in what you mean by > that. The implications of putting the complex system UTF-8 into a > POSIX-like operating systems as they exist today are not well > understood, and the resulting lose ends, conflicts, etc are not > resolved as of today. POSIX has never used "equivalences" for characters (i.e. case-differences), so I don't see what's so different in using UTF-8 instead of ISO-8859-1 for filenames: after all, one can treat UTF-8 as ISO-8859-1 without any problem at all, so from POSIX point of view, it all works, just displays as gibberish :) Using normalized forms would then simply be up to the writer and reader, just as it is up to the writer and reader today to check for all of "Music", "music", "mUSIC" and similar when a user actually searches for his music directory. Of course, going a step further and doing this in libdiskfs or wherever is nice as well. Users' expectations are that they can use their own characters. Character set is only an implementation detail, and whoever cares about it is not a regular user, but a technical computer user (a programmer most commonly). UTF-8 in that sense simplifies the implementation, instead of complicating it (as you seem to be suggesting), and it further improves the portability. Of course, UTF-8 is no hammer for every nail, as you put it, but it's clearly an improvement over any 8-bit character set in the POSIX world. Well, this is just my opinion at least :) Cheers, Danilo _______________________________________________ Bug-hurd mailing list Bug-hurd@gnu.org http://lists.gnu.org/mailman/listinfo/bug-hurd