Re: log --search test failures on trunk and 1.8.x

Mattias Engdegård Sun, 21 Apr 2013 13:19:10 -0700

21 apr 2013 kl. 20.07 skrev Branko Čibej:

Yes, the obvious ones are German (ß == SS) equivalence and turkic (i==İ) and (ı == I) equivalences (and that's aready three characters);but
then in French, lowercase accented letters are equivalent to uppercase
unaccented letters, whereas for example in Spanish that's not thecase.And that's just looking at European and West Asian Latin scripts.There
are at least 7 distinct Cyrillic scripts in roughly the same area that
I'm aware of, and I certainly don't know the case-folding rules forall
of them.

Not only is the above true, one should also be careful to distinguishcase conversion from case-insensitive matching; these follow differentrules.

For instance, converting lower-case letters to upper case in Frenchwill retain the accents (most of the time - this is locale-dependent),but they are generally expected to be ignored when searching. Bycontrast, it would be an error to match "a" with "ä" in Swedish whensearching, or to drop the dots in a case conversion.

Clearly a case- and accent-sensitive search is much easier toimplement, but would benefit from normalisation. Bytewise matching ison the lowest rung.

Re: log --search test failures on trunk and 1.8.x

Reply via email to