On Fri, 1 Sep 2017 09:53 am, MRAB wrote: > What would you expect the result would be for: > > "\N{LATIN SMALL LIGATURE FI}".case_insensitive_find("F") > > "\N{LATIN SMALL LIGATURE FI}".case_insensitive_find("I)
That's easy. -1 in both cases, since neither "F" nor "I" is found in either string. We can prove this by manually checking: py> for c in "\N{LATIN SMALL LIGATURE FI}": ... print(c, 'F' in c, 'f' in c) ... print(c, 'I' in c, 'i' in c) ... fi False False fi False False If you want some other result, then you're not talking about case sensitivity. If anyone wants to propose "normalisation-insensitive matching", I'll ask you to please start your own thread rather than derailing this one with an unrelated, and much more difficult, problem. The proposal here is *case insensitive* matching, not Unicode normalisation. If you want to decompose the strings, you know how to: py> import unicodedata py> unicodedata.normalize('NFKD', "\N{LATIN SMALL LIGATURE FI}") 'fi' -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list