Marcus Sundman wrote:
> Bart Smaalders <[EMAIL PROTECTED]> wrote:
>>> I'm unable to find more info about this. E.g., what does "reject
>>> file names" mean in practice? E.g., if a program tries to create a
>>> file using an utf8-incompatible filename, what happens? Does the
>>> fopen() fail? Would this normally be a problem? E.g., do tar and
>>> similar programs convert utf8-incompatible filenames to utf8 upon
>>> extraction if my locale (or wherever the fs encoding is taken from)
>>> is set to use utf-8? If they don't, then what happens with archives
>>> containing utf8-incompatible filenames?
>>
>> Note that the normal ZFS behavior is exactly what you'd expect: you
>> get the filenames you wanted; the same ones back you put in.
> 
> OK, thanks. I still haven't got any answer to my original question,
> though. I.e., is there some way to know what text the filename is, or
> do I have to make a more or less wild guess what encoding the program
> that created the file used?

How do you expect the filesystem to know this?  Open(2) takes 3 args;
none of them have anything to do with the encoding.

> OK, if I use utf8only then I know that all filenames can be interpreted
> as UTF-8. However, that's completely unacceptable for me, since I'd
> much rather have an important file with an incomprehensible filename
> than not have that important file at all. Also, what about non-UTF-8
> encodings? E.g., is it possible to know whether 0xe4 is "ä" (as in
> iso-8859-1) or "ф" (as in iso-8859-5)?
> 

There are two characters not allowed in filenames: NULL and '/'.  Everything
else is meaning imparted by the user, just like the contents of text
documents.

>> The trick is that in order to support such things as
>> casesensitivity=false for CIFS, the OS needs to know what characters
>> are uppercase vs lowercase, which means it needs to know about
>> encodings, and reject codepoints which cannot be classified as
>> uppercase vs lowercase.
> 
> I don't see why the OS would care about that. Isn't that the job of the
> CIFS daemon? 

If my program attempts to open file "fred" in a case-insensitive filesystem
and "FRED" exists, I would expect to get a handle to "FRED".  In order for
the filesystem to do this, the OS must be able to perform this comparison.

CIFS is in the kernel; case insensitivity is a property of the 
filesystem, not
a layer added on by a daemon.  If not, I could create "fred" and "FRED"
locally, and then which one would I get were I to open "FrEd" via CIFS?


> As a matter of fact I don't see why the OS would need to
> know how to decode any filename-bytes to text. However, I firmly
> believe that user applications should have that opportunity. If the
> encoding of filenames is not known (explicitly or implicitly) then
> applications don't have that opportunity.

The OS doesn't care; the user does.  If a user creates a file named
წყალსა in his home directory, but my encoding doesn't contain these 
characters,
what should ls -l display?  You also assume that knowing the encoding
will transfer meaning... but a directory containing files named
ᚠᚱᚩᚠᚢᚱ, ᛞᚩᛗᛖᛋ and ᚻᛚᛇᛏᚪᚾ may as well be line noise for most of us.

The OS doesn't care one whit about language or encodings (save
the optional upper/lower case accommodation for CIFS).  The OS simply
stores files under names that don't contain either '/' or NULL.

UTF8 is the answer here.  If you care about anything more than simple
ascii and you work in more than a single locale/encoding, use UTF8.
You may not understand the meaning of a filename, but at least
you'll see the same characters as the person who wrote it.
- Bart

-- 
Bart Smaalders                  Solaris Kernel Performance
[EMAIL PROTECTED]               http://blogs.sun.com/barts
"You will contribute more with mercurial than with thunderbird."
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to