On Mon, 2010-11-01 at 10:36 +0100, Joerg Schilling wrote:
> "Garrett D'Amore" <garr...@damore.org> wrote:
> 
> > On Mon, 2010-10-25 at 00:43 -0700, shilpa wrote:
> > > How would glob feature work, in case multibyte filenames are allowed? 
> > > Because a multibyte character is a combination of more than one 
> > > character, which includes glob characters like "{", "["....
> >
> > I'm not sure how globbing would work, frankly.  However, I believe UTF8
> > and other common multibyte character schemes always have bytes with the
> > high-order bit set, so that there is never a multibyte character that
> > has component bytes that collide with ASCII.  So this problem should be
> > a non-issue.
> 
> The assumption that multi-byte characters use octets with the high order bit 
> set is only correct for so called stateless locales.
> 
> Locales that use shift codes behave different.

Actually, its a safe assumption for UTF-8, which is the main concern I
think.

The bigger question here is not locales, but character encoding schemes,
I think.  Specifically we're talking about filenames, which do not
inherently carry a locale with them, but might be encoded in one of a
small number of locales... for UTF-8 I believe the code is fine.

Also, if the code is using libc's glob interfaces, its fine too, because
libc's glob code is sensitive to the locale and correctly handles
stateful encodings.

        - Garrett

_______________________________________________
opensolaris-code mailing list
opensolaris-code@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/opensolaris-code

Reply via email to