Jeff Senn <s...@users.sourceforge.net> added the comment: > Feel free to upload it here. I'm fairly skeptical that it is > possible to implement casing "correctly" in a locale-independent > way.
Ok. I will try to find time to complete it enough to be readable. Unicode (see sec 3.13) specifies the casing of unicode strings pretty completely -- i.e. it gives "Default Casing" rules to be used when no locale specific "tailoring" is available. The only dependencies on locale for the special casing rules are for Turkish, Azeri, and Lithuanian. And you only need to know that that is the language, no other details. So I'm sure that a complete implementation is possible without resort to a lot of locale munging -- at least for .lower() .upper() and .title(). .swapcase() is just ...err... dumb^h^h^h^h questionably useful. However .capitalize() is a bit weird; and I'm not sure it isn't incorrectly implemented now: It UPPERCASES the first character, rather than TITLECASING, which is probably wrong in the very few cases where it makes a difference: e.g. (using Croatian ligatures) >>> u'\u01c5amonjna'.title() u'\u01c4amonjna' >>> u'\u01c5amonjna'.capitalize() u'\u01c5amonjna' "Capitalization" is not precisely defined (by the Unicode standard) -- the currently python implementation doesn't even do what the docs say: "makes the first character have upper case" (it also lower-cases all other characters!), however I might argue that a more useful implementation "makes the first character have titlecase..." ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue4610> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com