Nicolas Williams wrote:
On Wed, Aug 12, 2009 at 06:17:44PM -0500, Haudy Kazemi wrote:
I'm wondering what are some use cases for ZFS's utf8only and normalization properties. They are off/none by default, and can only be set when the filesystem is created. When should they specifically be enabled and/or disabled? (i.e. Where is using them a really good idea? Where is using them a really bad idea?)

These are for interoperability.

The world is converging on Unicode for filesystem object naming.  If you
want to exclude non-Unicode strings then you should set utf8only (some
non-Unicode strings in some codesets can look like valid UTF-8 though).

But Unicode has multiple canonical and non-canonical ways of
representing certain characters (e.g., ´).  Solaris and Windows
input methods tend to conform to NFKC, so they will interop even if you
don't enable the normalization feature.  But MacOS X normalizes to NFD.

Therefore, if you need to interoperate with MacOS X then you should
enable the normalization feature.
Thank you for the reply. My goal is to configure the filesystem for the lowest common denominator without knowing up front which clients will be used. OS X and Win XP are listed because they are commonly used as desktop OSes. Ubuntu Linux is a third potential desktop OS.

The normalization property documentation says "this property indicates whether a file system should perform a unicode normalization of file names whenever two file names are compared. File names are always stored unmodified, names are normalized as part of any comparison process." Where does the file system use filename comparisons and what does it use them for? Filename collision checking? Sorting?

Is it used for any other operation, say when returning a filename to an application? Would applications reading/writing files to a ZFS filesystem ever notice the difference in normalization settings as long as they produce filenames that do not conflict with existing names or create invalid UTF8? The documentation says filenames are stored unmodified, which sounds like things should be transparent to applications.

(In regard to filename collision checking, if non-normalized unmodified filenames are always stored on disk, and they don't conflict in non-normalized form, what would the point be of normalizing the filenames for a comparison? To verify there isn't conflict in normalized forms, and if there is no conflict with an existing file to allow the filename to be written unmodified?)


Looking forward, starting with Windows XP and OS X 10.5 clients, is there any reason to change the defaults in order to minimize problems?

You should definetely enable normalization (see above).

It doesn't matter what normalization form you use, but "nfd" runs faster
than "nfc".

The normalization feature doesn't cost much if you use all US-ASCII file
names.  And it doesn't cost much if your file names are mostly US-ASCII.

Nico
The ZFS documentation doesn't list the valid values for the normalization property other than 'none. From your reply and from the the official unicode docs at http://unicode.org/reports/tr15/ and http://unicode.org/faq/normalization.html would it be correct to conclude that none, NFD, NFC, NFKC, and NFKD are the only valid values for the ZFS normalization property? If so, I suggest they be added to the documentation at
http://dlc.sun.com/osol/docs/content/ZFSADMIN/gazss.html

Thanks,

-hk





_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to