Nicolas Williams wrote:
On Wed, Aug 12, 2009 at 06:17:44PM -0500, Haudy Kazemi wrote:
I'm wondering what are some use cases for ZFS's utf8only and
normalization properties. They are off/none by default, and can only be
set when the filesystem is created. When should they specifically be
enabled and/or disabled? (i.e. Where is using them a really good idea?
Where is using them a really bad idea?)
These are for interoperability.
The world is converging on Unicode for filesystem object naming. If you
want to exclude non-Unicode strings then you should set utf8only (some
non-Unicode strings in some codesets can look like valid UTF-8 though).
But Unicode has multiple canonical and non-canonical ways of
representing certain characters (e.g., ´). Solaris and Windows
input methods tend to conform to NFKC, so they will interop even if you
don't enable the normalization feature. But MacOS X normalizes to NFD.
Therefore, if you need to interoperate with MacOS X then you should
enable the normalization feature.
Thank you for the reply. My goal is to configure the filesystem for the
lowest common denominator without knowing up front which clients will be
used. OS X and Win XP are listed because they are commonly used as
desktop OSes. Ubuntu Linux is a third potential desktop OS.
The normalization property documentation says "this property indicates
whether a file system should perform a unicode normalization of file
names whenever two file names are compared. File names are always
stored unmodified, names are normalized as part of any comparison
process." Where does the file system use filename comparisons and what
does it use them for? Filename collision checking? Sorting?
Is it used for any other operation, say when returning a filename to an
application? Would applications reading/writing files to a ZFS
filesystem ever notice the difference in normalization settings as long
as they produce filenames that do not conflict with existing names or
create invalid UTF8? The documentation says filenames are stored
unmodified, which sounds like things should be transparent to applications.
(In regard to filename collision checking, if non-normalized unmodified
filenames are always stored on disk, and they don't conflict in
non-normalized form, what would the point be of normalizing the
filenames for a comparison? To verify there isn't conflict in
normalized forms, and if there is no conflict with an existing file to
allow the filename to be written unmodified?)
Looking forward, starting with Windows XP and OS X 10.5 clients, is
there any reason to change the defaults in order to minimize problems?
You should definetely enable normalization (see above).
It doesn't matter what normalization form you use, but "nfd" runs faster
than "nfc".
The normalization feature doesn't cost much if you use all US-ASCII file
names. And it doesn't cost much if your file names are mostly US-ASCII.
Nico
The ZFS documentation doesn't list the valid values for the
normalization property other than 'none. From your reply and from the
the official unicode docs at
http://unicode.org/reports/tr15/ and
http://unicode.org/faq/normalization.html
would it be correct to conclude that none, NFD, NFC, NFKC, and NFKD are
the only valid values for the ZFS normalization property? If so, I
suggest they be added to the documentation at
http://dlc.sun.com/osol/docs/content/ZFSADMIN/gazss.html
Thanks,
-hk
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss