Bart Smaalders <[EMAIL PROTECTED]> wrote:

> > OK, thanks. I still haven't got any answer to my original question,
> > though. I.e., is there some way to know what text the filename is, or
> > do I have to make a more or less wild guess what encoding the program
> > that created the file used?
>
> How do you expect the filesystem to know this?  Open(2) takes 3 args;
> none of them have anything to do with the encoding.

A while ago, when discussing thing with some filesystem guys, I made the 
proposal to introduce a new syscall to inform the kernel about the locale 
coding used by a process. If the kernel (or filesystem) then like to store
file names in a kernel-specific way and if there is a in-kernel libiconv,
the kernel could convert from/to the userland view. A problem that remains
is a userland coding that probably cannot represent all "characters" used 
inside the kernel view.


> There are two characters not allowed in filenames: NULL and '/'.  Everything
> else is meaning imparted by the user, just like the contents of text
> documents.

Platforms that insist in UTF-8 codinf for filenames often disallow octett 
codingd tha are not valid inside a UTF-8 character sequence.

>
> The OS doesn't care; the user does.  If a user creates a file named
> ?????????????????? in his home directory, but my encoding doesn't contain 
> these 
> characters,
> what should ls -l display?  You also assume that knowing the encoding
> will transfer meaning... but a directory containing files named
> ??????????????????, ??????????????? and ?????????????????? may as well be 
> line noise for most of us.
>
> The OS doesn't care one whit about language or encodings (save
> the optional upper/lower case accommodation for CIFS).  The OS simply
> stores files under names that don't contain either '/' or NULL.
>
> UTF8 is the answer here.  If you care about anything more than simple
> ascii and you work in more than a single locale/encoding, use UTF8.
> You may not understand the meaning of a filename, but at least
> you'll see the same characters as the person who wrote it.

UTF-8 may be the answer for many but definitely not all problems.
UTF-8 may make less problems in 5 years (if more people then use it) than
the problem known with UTF-8 today.

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
       [EMAIL PROTECTED]                (uni)  
       [EMAIL PROTECTED]     (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to