Hi Samuel, Today at 14:04, Samuel Thibault wrote:
> Normalized form take care of glyphs that really can be coded several > different ways: for instance, latin e with acute accent may be directly > coded as 'ÃÂ', but in unicode, may also be coded as 'e' followed by the > combining acute accent. These are really *two* ways to code *exactly* > the same thing (on the displaying point of view: an 'e' with an acute > accent above it). Hence normalization is needed to match both. You missed my point: POSIX is not concerned with this, and they're not the same if you're asking POSIX (try doing a stat() on two such files, and let me know the results :). It's up to the "handler" to make choice on one normalisation form and make the most of it (i.e. optimise around it). They are same from ISO-10646 and Unicode POV, but just like "A" and "a" are same from ISO-14561 (in *certain* contexts, eg. when you're doing case-insensitive collation) POV, it's irrelevant. What are you going to do when you come across a filesystem where you have two files with such names which only differ in normalisation form used (i.e. fully decomposed or fully composed)? Yeah, you can ensure that no filesystem created via GNU/Hurd is going to have such instances, but what about filesystems created elsewhere? Are you going to treat such filesystems as erroneous? Filenames are 8-bit ASCII compatible strings (UTF-FS as in "filesystem-safe" originally), and that's all you need to know to make POSIX-compliant programs. My example above was simply this: if you go the route of treating several different things as one (from implementation POV), you'll end up with the mess Microsoft has on Windows with case-insensitive filenames. Cheers, Danilo _______________________________________________ Bug-hurd mailing list Bug-hurd@gnu.org http://lists.gnu.org/mailman/listinfo/bug-hurd