Bart Smaalders <[EMAIL PROTECTED]> wrote: > > OK, thanks. I still haven't got any answer to my original question, > > though. I.e., is there some way to know what text the filename is, or > > do I have to make a more or less wild guess what encoding the program > > that created the file used? > > How do you expect the filesystem to know this? Open(2) takes 3 args; > none of them have anything to do with the encoding.
A while ago, when discussing thing with some filesystem guys, I made the proposal to introduce a new syscall to inform the kernel about the locale coding used by a process. If the kernel (or filesystem) then like to store file names in a kernel-specific way and if there is a in-kernel libiconv, the kernel could convert from/to the userland view. A problem that remains is a userland coding that probably cannot represent all "characters" used inside the kernel view. > There are two characters not allowed in filenames: NULL and '/'. Everything > else is meaning imparted by the user, just like the contents of text > documents. Platforms that insist in UTF-8 codinf for filenames often disallow octett codingd tha are not valid inside a UTF-8 character sequence. > > The OS doesn't care; the user does. If a user creates a file named > ?????????????????? in his home directory, but my encoding doesn't contain > these > characters, > what should ls -l display? You also assume that knowing the encoding > will transfer meaning... but a directory containing files named > ??????????????????, ??????????????? and ?????????????????? may as well be > line noise for most of us. > > The OS doesn't care one whit about language or encodings (save > the optional upper/lower case accommodation for CIFS). The OS simply > stores files under names that don't contain either '/' or NULL. > > UTF8 is the answer here. If you care about anything more than simple > ascii and you work in more than a single locale/encoding, use UTF8. > You may not understand the meaning of a filename, but at least > you'll see the same characters as the person who wrote it. UTF-8 may be the answer for many but definitely not all problems. UTF-8 may make less problems in 5 years (if more people then use it) than the problem known with UTF-8 today. Jörg -- EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin [EMAIL PROTECTED] (uni) [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss