On Apr 6, 8:53 am, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > >> I know I could use:- > > >> if lower(string1) in lower(string2): > >> <do something> > > >> but it somehow feels there ought to be an easier (tidier?) way. > > > Easier? You mean like some kind of mind meld? > > Interestingly enough, it shouldn't be (but apparently is) obvious that > > a.lower() in b.lower() > > is a way of expressing "a is a substring of b, with case-insensitive > matching". Can we be sure that these are really the same concepts, > and if so, is > > a.upper() in b.upper() > > also equivalent? > > It's probably a common assumption that, for any character c, > c.lower()==c.upper().lower(). Yet, > > py> [i for i in range(65536) if unichr(i).upper().lower() != > unichr(i).lower()] > [181, 305, 383, 837, 962, 976, 977, 981, 982, 1008, 1009, 1010, 1013, > 7835, 8126] > > Take, for example, U+017F, LATIN SMALL LETTER LONG S. It's .lower() is > the same character, as the character is already in lower case. > It's .upper() is U+0053, LATIN CAPITAL LETTER S. Notice that the LONG > is gone - there is no upper-case version of a "long s". > It's .upper().lower() is U+0073, LATIN SMALL LETTER S. > > So should case-insensitive matching match the small s with the small > long s, as they have the same upper-case letter? > > Regards, > Martin
Another surprise (or maybe not so surprising) - this "upper != lower" is not symmetric. Using the inverse of your list comp, I get >>> [i for i in range(65536) if unichr(i).lower().upper() != ... unichr(i).upper()] [304, 1012, 8486, 8490, 8491] Instead of 15 exceptions to the rule, conversion to upper has only 5 exceptions. So perhaps comparsion of upper's is, while not foolproof, less likely to encounter these exceptions? Or at least, simpler to code explicit tests. -- Paul -- http://mail.python.org/mailman/listinfo/python-list