On 8/31/17, Steve D'Aprano <steve+pyt...@pearwood.info> wrote:
>> Additionally: a proper "case insensitive comparison" should almost >> certainly start with a Unicode normalization. But should it be NFC/NFD >> or NFKC/NFKD? IMO that's a good reason to leave it in the hands of the >> application. > > Normalisation is orthogonal to comparisons and searches. Python doesn't > automatically normalise strings, as people have pointed out a bazillion > times > in the past, and it happily compares > > 'ö' LATIN SMALL LETTER O WITH DIAERESIS > > 'ö' LATIN SMALL LETTER O + COMBINING DIAERESIS > > > as unequal. I don't propose to change that just so that we can get 'a' > equals 'A' :-) Locale-dependent Case Mappings. The principal example of a case mapping that depends on the locale is Turkish, where U+0131 “ı” latin small letter dotless i maps to U+0049 “I” latin capital letter i and U+0069 “i” latin small letter i maps to U+0130 “İ” latin capital letter i with dot above. (source: http://www.unicode.org/versions/Unicode10.0.0/ch05.pdf) So 'SIKISIN'.casefold() could be dangerous -> https://translate.google.com/#tr/en/sikisin%0As%C4%B1k%C4%B1s%C4%B1n (although I am not sure if this story is true -> https://www.theinquirer.net/inquirer/news/1017243/cellphone-localisation-glitch ) -- https://mail.python.org/mailman/listinfo/python-list