Paul McGuire wrote: > On Apr 6, 8:53 am, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: >>>> I know I could use:- >>>> if lower(string1) in lower(string2): >>>> <do something> >>>> but it somehow feels there ought to be an easier (tidier?) way. >> Take, for example, U+017F, LATIN SMALL LETTER LONG S. It's .lower() is >> the same character, as the character is already in lower case. >> It's .upper() is U+0053, LATIN CAPITAL LETTER S. Notice that the LONG >> is gone - there is no upper-case version of a "long s". >> It's .upper().lower() is U+0073, LATIN SMALL LETTER S. >> >> So should case-insensitive matching match the small s with the small >> long s, as they have the same upper-case letter? [ ... ] >>>> [i for i in range(65536) if unichr(i).lower().upper() != > ... unichr(i).upper()] > [304, 1012, 8486, 8490, 8491] > > Instead of 15 exceptions to the rule, conversion to upper has only 5 > exceptions. So perhaps comparsion of upper's is, while not foolproof, > less likely to encounter these exceptions? Or at least, simpler to > code explicit tests.
I don't know what meaning is carried by all those differences in lower-case glyphs. Converting to upper seems to fold together a lot of variant pi's and rho's which I think would be roughly a good thing. I seem to recall that the tiny iota (ypogegrammeni) has or had grammatical significance. The other effect would be conflating physics' Angstron unit and Kelvin unit signs with ring-a and K. Applicaton programmers beware. Mel. -- http://mail.python.org/mailman/listinfo/python-list