On Mon, 2010-11-01 at 10:36 +0100, Joerg Schilling wrote: > "Garrett D'Amore" <garr...@damore.org> wrote: > > > On Mon, 2010-10-25 at 00:43 -0700, shilpa wrote: > > > How would glob feature work, in case multibyte filenames are allowed? > > > Because a multibyte character is a combination of more than one > > > character, which includes glob characters like "{", "[".... > > > > I'm not sure how globbing would work, frankly. However, I believe UTF8 > > and other common multibyte character schemes always have bytes with the > > high-order bit set, so that there is never a multibyte character that > > has component bytes that collide with ASCII. So this problem should be > > a non-issue. > > The assumption that multi-byte characters use octets with the high order bit > set is only correct for so called stateless locales. > > Locales that use shift codes behave different.
Actually, its a safe assumption for UTF-8, which is the main concern I think. The bigger question here is not locales, but character encoding schemes, I think. Specifically we're talking about filenames, which do not inherently carry a locale with them, but might be encoded in one of a small number of locales... for UTF-8 I believe the code is fine. Also, if the code is using libc's glob interfaces, its fine too, because libc's glob code is sensitive to the locale and correctly handles stateful encodings. - Garrett _______________________________________________ opensolaris-code mailing list opensolaris-code@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/opensolaris-code