On Thu, Aug 13, 2009 at 05:57:57PM -0500, Haudy Kazemi wrote:
> >Therefore, if you need to interoperate with MacOS X then you should
> >enable the normalization feature.
> >  
> Thank you for the reply. My goal is to configure the filesystem for the 
> lowest common denominator without knowing up front which clients will be 
> used. OS X and Win XP are listed because they are commonly used as 
> desktop OSes.  Ubuntu Linux is a third potential desktop OS.

Right, so set normalization=formD .

> The normalization property documentation says "this property indicates 
> whether a file system should perform a unicode normalization of file 
> names whenever two file names are compared.  File names are always 
> stored unmodified, names are normalized as part of any comparison 
> process."  Where does the file system use filename comparisons and what 
> does it use them for?  Filename collision checking?  Sorting?

The system does filename comparisons when doing lookups
(open("/foo/bar/baz", ...) does at least three such lookups, for
example), and on create (since that involves a lookup).

Yes, this is about collisions.  Consider a file named "รก" (that's "a"
with an acute accent).  There are _two_ possible encodings for that name
in UTF-8.  That means that you could have two files in the same
directory and with the same name, though they'd have different names if
you looked at the bytes that make up the names.  That would be
confusing, at the very least.

To avoid such collisions you can enable normalization.

You can find more here:

http://blogs.sun.com/nico/entry/filesystem_i18n

> Is it used for any other operation, say when returning a filename to an 
> application?  Would applications reading/writing files to a ZFS 

No, directory listings always return the filename used when the file
name was created, without any normalization.

> filesystem ever notice the difference in normalization settings as long 
> as they produce filenames that do not conflict with existing names or 
> create invalid UTF8?  The documentation says filenames are stored 
> unmodified, which sounds like things should be transparent to applications.

Applications shouldn't notice normalization being enabled.  The only
reasons to disable normalization are: a) you don't want to force the use
of UTF-8, or b) you consistently use a single normalization form and you
don't want to pay a penalty for normalizing on lookup.

(b) is probably not a problem -- the normalization code is fast if you
use all US-ASCII strings, and it's linear with the number of non-ASCII,
Unicode codepoints in file names.  But I don't have performance numbers
to share.  I think that normalization should be enabled by default if
you enable utf8only, and utf8only should probably be enabled by default
in Solaris, but that's just my personal opinion.

> (In regard to filename collision checking, if non-normalized unmodified 
> filenames are always stored on disk, and they don't conflict in 
> non-normalized form, what would the point be of normalizing the 
> filenames for a comparison?  To verify there isn't conflict in 
> normalized forms, and if there is no conflict with an existing file to 
> allow the filename to be written unmodified?)

Yes.

> The ZFS documentation doesn't list the valid values for the 
> normalization property other than 'none.  From your reply and from the 

The zfs(1M) manpage lists them:

     normalization = none | formD | formKCf

That's not all existing Unicode normalization forms, no.  The reason for
this is that we only normalize on lookup (the file names returned by
readdir are not normalized), and for that the forms C and D are
semantically equivalent, but K and non-K forms are not semantically
equivalent, so we need one K form and one non-K form.  NFD is faster
than NFC, but the K forms require a trip through form C, so NFKC is
faster than NFKD (at least if I remember correctly).  Which means that
NFD and NFKC were sufficient, and there's no reason to ever want NFC or
NFKD.

> suggest they be added to the documentation at
> http://dlc.sun.com/osol/docs/content/ZFSADMIN/gazss.html

Yes, that's a good point.

PS:  ZFS directories are hashed.  When normalization is enabled, the
     hash keys are normalized on create, but the hash contents are not,
     so filenames rename unnormalized.
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to