> I have a set of strings (all letters are capitalized) at utf-8, That's the problem. If these are really utf-8 encoded byte strings, then .lower likely won't work. It uses the C library's tolower API, which works on a byte level, i.e. can't work for multi-byte encodings.
What you need to do is to operate on Unicode strings. I.e. instead of s.lower() do s.decode("utf-8").lower() or (if you need byte strings back) s.decode("utf-8").lower().encode("utf-8") If you find that you write the latter, I recommend that you redesign your application. Don't use byte strings to represent text, but use Unicode strings all the time, except at the system boundary (where you decode/encode as appropriate). There are some limitations with Unicode .lower also, but I don't think they apply to Russian (specifically, SpecialCasing.txt is not considered). HTH, Martin -- http://mail.python.org/mailman/listinfo/python-list