Jeffrey C. Jacobs <[EMAIL PROTECTED]> added the comment: I think this is even more complicated when you consider that localization my be an issue. Consider "Á": is this grammatically before "A" or after "a"? From a character set point of view, it is typically after "a" but when Locale is taken into account, all that is done is there is a change to relative ordering, so Á appears somewhere before A and B. But when this is done, does that mean that [9-Á] is going to cover ALL uppercase and ALL lowercase and ALL characters with ord from 91 to 96 and 123 to 127 and all kinds of other UNICODE symbols? And how will this effect case-insensitivity.
In a sense, I think it may only be safe to say that character class ranges are ONLY appropriate over Alphabetic character ranges or numeric character ranges, since the order of the ASCII symbols between 0 and 47, 56 and 64, 91 adn 96 and 123 and 127, though well-defined, are none the less implementation dependent. When we bring UNICODE into this, things get even more befuddled with some Latin characters in Latin-1, some in Latin-2, Cyrillic, Hebrew, Arabic, Chinese, Japanese and Korean character sets just to name a few of the most common! And how does a total ordering of characters apply to them? In the end, I think it's just dangerous to define character group ranges that span the gap BETWEEN numbers and alphabetics. Instead, I think a better solution is simply to implement Emacs / Perl style named character classes as in issue 2636 sub-item 8. I do agree this is a problem, but as I see it, the solution may not be that simple, especially in a UNICODE world. _______________________________________ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue3511> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com