Re: [zfs-discuss] path-name encodings

Nicolas Williams Wed, 27 Feb 2008 22:04:29 -0800

On Thu, Feb 28, 2008 at 05:57:21AM +0100, Roland Mainz wrote:
> Tim Haley wrote:
> > ZFS doesn't muck with names it is sent when storing them on-disk.  The
> > on-disk name is exactly the sequence of bytes provided to the open(),
> > creat(), etc.  If normalization options are chosen, it may do some
> > manipulation of the byte strings *when comparing* names, but the on-disk
> > name should be untouched from what the user requested.
> 
> Ok... that was the part which I was _praying_ for... :-)
> 
> ... just some background (for those who may be puzzled by the statement
> above): The conversion to Unicode is not always "lossless" (Unicode is
> sometimes marketed as
> "convert-any-encoding-to-unicode-without-loosing-any-information") ...
> for example if you have a mixed-language ISO-2022 character sequence the
> conversion to Unicode will use the language information itself and
> converting it back to an ISO-2022 sequence will result in a different
> multibyte sequence than the original input (the issue could be
> worked-around by inserting the "language tag" characters to preserve
> this information but almost every converter doesn't do that (and since
> these "tags" are outside the BMP you have to pray that everything in the
> toolchain works with Unicode charcters beyond 65535) ... ;-( ).


Keep in mind that NFSv4 requires use of UTF-8 on the wire.  Most
implementations just-use-8, including Solaris, but IIRC ZFS has an
option to require/allow only valid UTF-8 byte sequences, and it has
support for normalization-insensitive/preserving behaviour on
lookup/create, so the Solaris server is approaching compliance with the
NFSv4 spec, and the client can be compliant if you use only UTF-8
locales :)

I.e., we (the industry) are converging on Unicode as the standard
codeset for filesystem object naming.

The upshot of this is that if you really care about lossless conversions
then you'll just have to avoid using problematic sequences in filesystem
object names.

It is important, for reasons like what you described, that other things
-- particularly document formats -- support codesets other than Unicode.
But I just don't see the NFS community adopting a multiplicity of
codesets for NFS (who knows, I might be wrong, and you could bring this
up on the IETF NFSv4 WG).

Nico
-- 
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] path-name encodings

Reply via email to