On 3/23/07, Denis Jacquerye <[EMAIL PROTECTED]> wrote: > On 3/23/07, Sven Neumann <[EMAIL PROTECTED]> wrote: > > Hi, > > > > On Fri, 2007-03-23 at 03:25 +0100, Denis Jacquerye wrote: > > > > > I'm sure there are tones of places where this doesn't work and some > > > where it does. But it should work everywhere someone does a search or > > > compares strings unless in some specific cases. What's the best way of > > > tackling the issue? > > > > It should work if all places where strings all compared would use > > g_utf8_collate(). I am surprised that this doesn't seem to be the case. > > Perhaps it's an issue that is often overlooked as many developers are > > not aware of the pitfalls of working with Unicode texts. > > g_utf8_collate() uses G_NORMALIZE_ALL_COMPOSE = G_NORMALIZE_NFKC so it > will find ² and 2 equivalent. Should that be the default for all > searches? > > Which is better? Using g_utf8_collate() instead of strcmp() or a > combination of g_utf8_normalize() and then strcmp()? > If g_utf8_normalize() is used, which normalization should be used? > > I'm now guessing it should be G_NORMALIZE_NFC = > G_NORMALIZE_DEFAULT_COMPOSE in most cases because this will match > canonically equivalent strings (eg. é and é equivalent) but not > compatibility ones (eg. ² and 2 different). It will also not partially > match things like "Bise" with "Bisé" where the combining diacritic is > at the end of the string.
Actually, I take that back. Partial match would be inconsistent with precomposed and those that can't be, eg. "bise" would not match "bisé" but "bisɛ" would match "bisɛ́". So unless there's a better option, G_NORMALIZE_NFD = G_NORMALIZE_DEFAULT should be used. > I'm also guessing g_utf8_collate() is more appropriate for sorting > than for searching. _______________________________________________ gnome-i18n mailing list gnome-i18n@gnome.org http://mail.gnome.org/mailman/listinfo/gnome-i18n