Joel Rees writes: > You can even handle broken UTF-8 and unconverted UTF-16/32 of whatever byte > order spit into the file name as a sequence of bytes if and only if you > escape NUL, slash, and your escape character properly, restoring the > escaped characters when putting the file names on the network.
This is just asking for security issues. It's the same kind of thinking that caused the designers of Java to allow embedding NUL in strings as 0xc0 0x80, or CESU-8 where you can encode astral characters with surrogate pairs instead of just writing the character directly. The kinds of things that make people think "Unicode is complex and prone to security issues," even though neither of them are allowed by the UTF-8 spec! > Normalization alone does not know how to restore a potentially normalized > name. It needs some sort of flag character that says "this name was > normalized", and a way to choose between de-normalized forms when more than > one denormalized form maps to one particular normal form. Once you start stacking multiple accents this becomes unworkable. > I haven't used Apple OSses since around 10.4, but Mac OS X was doing a > thing where certain well-known directory names were aliased according to > the current locale. For instance, the user's "music" directory was shown > as 「音楽」 when the locale was set to ja_JP.UTF-8. IMO this is totally crazy behavior and unrelated to the Unicode issue. -- Anthony J. Bentley