On Thu, Feb 28, 2008 at 05:57:21AM +0100, Roland Mainz wrote: > Tim Haley wrote: > > ZFS doesn't muck with names it is sent when storing them on-disk. The > > on-disk name is exactly the sequence of bytes provided to the open(), > > creat(), etc. If normalization options are chosen, it may do some > > manipulation of the byte strings *when comparing* names, but the on-disk > > name should be untouched from what the user requested. > > Ok... that was the part which I was _praying_ for... :-) > > ... just some background (for those who may be puzzled by the statement > above): The conversion to Unicode is not always "lossless" (Unicode is > sometimes marketed as > "convert-any-encoding-to-unicode-without-loosing-any-information") ... > for example if you have a mixed-language ISO-2022 character sequence the > conversion to Unicode will use the language information itself and > converting it back to an ISO-2022 sequence will result in a different > multibyte sequence than the original input (the issue could be > worked-around by inserting the "language tag" characters to preserve > this information but almost every converter doesn't do that (and since > these "tags" are outside the BMP you have to pray that everything in the > toolchain works with Unicode charcters beyond 65535) ... ;-( ).
Keep in mind that NFSv4 requires use of UTF-8 on the wire. Most implementations just-use-8, including Solaris, but IIRC ZFS has an option to require/allow only valid UTF-8 byte sequences, and it has support for normalization-insensitive/preserving behaviour on lookup/create, so the Solaris server is approaching compliance with the NFSv4 spec, and the client can be compliant if you use only UTF-8 locales :) I.e., we (the industry) are converging on Unicode as the standard codeset for filesystem object naming. The upshot of this is that if you really care about lossless conversions then you'll just have to avoid using problematic sequences in filesystem object names. It is important, for reasons like what you described, that other things -- particularly document formats -- support codesets other than Unicode. But I just don't see the NFS community adopting a multiplicity of codesets for NFS (who knows, I might be wrong, and you could bring this up on the IETF NFSv4 WG). Nico -- _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss