[issue3511] Incorrect charset range handling with ignore case flag?

Matthew Barnett Wed, 06 Aug 2008 12:42:30 -0700

New submission from Matthew Barnett <[EMAIL PROTECTED]>:

While working on the regex code in sre_compile.py I came across the
following code in the handling of charset ranges in _optimize_charset:


    for i in range(fixup(av[0]), fixup(av[1])+1):
        charmap[i] = 1

The function fixup converts the ends of the range to lower case if the
ignore-case flag is present. The problem with this approach is
illustrated below:

>>> import re
>>> print re.match(r'[9-A]', 'A')
<_sre.SRE_Match object at 0x00A78058>
>>> print re.match(r'[9-A]', 'a')
None
>>> print re.match(r'[9-A]', '_')
None
>>> print re.match(r'[9-A]', 'A', re.IGNORECASE)
<_sre.SRE_Match object at 0x00D0BFA8>
>>> print re.match(r'[9-A]', 'a', re.IGNORECASE)
<_sre.SRE_Match object at 0x00A78058>
>>> print re.match(r'[9-A]', '_', re.IGNORECASE)
<_sre.SRE_Match object at 0x00D0BFA8>
>>> 

'_' doesn't lie between '9' and 'A', but it does lie between '9' and 'a'.

Surely the ignore-case flag should not affect whether non-letters are
matched or not?

----------
components: Regular Expressions
messages: 70799
nosy: mrabarnett
severity: normal
status: open
title: Incorrect charset range handling with ignore case flag?
type: behavior
versions: Python 2.5

_______________________________________
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue3511>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue3511] Incorrect charset range handling with ignore case flag?

Reply via email to