On Thu, Aug 13, 2009 at 05:57:57PM -0500, Haudy Kazemi wrote: > >Therefore, if you need to interoperate with MacOS X then you should > >enable the normalization feature. > > > Thank you for the reply. My goal is to configure the filesystem for the > lowest common denominator without knowing up front which clients will be > used. OS X and Win XP are listed because they are commonly used as > desktop OSes. Ubuntu Linux is a third potential desktop OS.
Right, so set normalization=formD . > The normalization property documentation says "this property indicates > whether a file system should perform a unicode normalization of file > names whenever two file names are compared. File names are always > stored unmodified, names are normalized as part of any comparison > process." Where does the file system use filename comparisons and what > does it use them for? Filename collision checking? Sorting? The system does filename comparisons when doing lookups (open("/foo/bar/baz", ...) does at least three such lookups, for example), and on create (since that involves a lookup). Yes, this is about collisions. Consider a file named "รก" (that's "a" with an acute accent). There are _two_ possible encodings for that name in UTF-8. That means that you could have two files in the same directory and with the same name, though they'd have different names if you looked at the bytes that make up the names. That would be confusing, at the very least. To avoid such collisions you can enable normalization. You can find more here: http://blogs.sun.com/nico/entry/filesystem_i18n > Is it used for any other operation, say when returning a filename to an > application? Would applications reading/writing files to a ZFS No, directory listings always return the filename used when the file name was created, without any normalization. > filesystem ever notice the difference in normalization settings as long > as they produce filenames that do not conflict with existing names or > create invalid UTF8? The documentation says filenames are stored > unmodified, which sounds like things should be transparent to applications. Applications shouldn't notice normalization being enabled. The only reasons to disable normalization are: a) you don't want to force the use of UTF-8, or b) you consistently use a single normalization form and you don't want to pay a penalty for normalizing on lookup. (b) is probably not a problem -- the normalization code is fast if you use all US-ASCII strings, and it's linear with the number of non-ASCII, Unicode codepoints in file names. But I don't have performance numbers to share. I think that normalization should be enabled by default if you enable utf8only, and utf8only should probably be enabled by default in Solaris, but that's just my personal opinion. > (In regard to filename collision checking, if non-normalized unmodified > filenames are always stored on disk, and they don't conflict in > non-normalized form, what would the point be of normalizing the > filenames for a comparison? To verify there isn't conflict in > normalized forms, and if there is no conflict with an existing file to > allow the filename to be written unmodified?) Yes. > The ZFS documentation doesn't list the valid values for the > normalization property other than 'none. From your reply and from the The zfs(1M) manpage lists them: normalization = none | formD | formKCf That's not all existing Unicode normalization forms, no. The reason for this is that we only normalize on lookup (the file names returned by readdir are not normalized), and for that the forms C and D are semantically equivalent, but K and non-K forms are not semantically equivalent, so we need one K form and one non-K form. NFD is faster than NFC, but the K forms require a trip through form C, so NFKC is faster than NFKD (at least if I remember correctly). Which means that NFD and NFKC were sufficient, and there's no reason to ever want NFC or NFKD. > suggest they be added to the documentation at > http://dlc.sun.com/osol/docs/content/ZFSADMIN/gazss.html Yes, that's a good point. PS: ZFS directories are hashed. When normalization is enabled, the hash keys are normalized on create, but the hash contents are not, so filenames rename unnormalized. _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss