New submission from Matthew Barnett <[EMAIL PROTECTED]>: While working on the regex code in sre_compile.py I came across the following code in the handling of charset ranges in _optimize_charset:
for i in range(fixup(av[0]), fixup(av[1])+1): charmap[i] = 1 The function fixup converts the ends of the range to lower case if the ignore-case flag is present. The problem with this approach is illustrated below: >>> import re >>> print re.match(r'[9-A]', 'A') <_sre.SRE_Match object at 0x00A78058> >>> print re.match(r'[9-A]', 'a') None >>> print re.match(r'[9-A]', '_') None >>> print re.match(r'[9-A]', 'A', re.IGNORECASE) <_sre.SRE_Match object at 0x00D0BFA8> >>> print re.match(r'[9-A]', 'a', re.IGNORECASE) <_sre.SRE_Match object at 0x00A78058> >>> print re.match(r'[9-A]', '_', re.IGNORECASE) <_sre.SRE_Match object at 0x00D0BFA8> >>> '_' doesn't lie between '9' and 'A', but it does lie between '9' and 'a'. Surely the ignore-case flag should not affect whether non-letters are matched or not? ---------- components: Regular Expressions messages: 70799 nosy: mrabarnett severity: normal status: open title: Incorrect charset range handling with ignore case flag? type: behavior versions: Python 2.5 _______________________________________ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue3511> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com