Vlastimil Brom <vlastimil.b...@gmail.com> added the comment: Just another rather marginal findings; differences between regex and re:
>>> regex.findall(r"[\B]", "aBc") ['B'] >>> re.findall(r"[\B]", "aBc") [] (Python 2.7 ... on win32; regex - issue2636-20100912.zip) I believe, regex is more correct here, as uppercase \B doesn't have a special meaning within a set (unlike backspace \b), hence it should be treated as B, but I wanted to mention it as a difference, just in case it would matter. I also noticed another case, where regex is more permissive: >>> regex.findall(r"[\d-h]", "ab12c-h") ['1', '2', '-', 'h'] >>> re.findall(r"[\d-h]", "ab12c-h") Traceback (most recent call last): File "<input>", line 1, in <module> File "re.pyc", line 177, in findall File "re.pyc", line 245, in _compile error: bad character range >>> howewer, there might be an issue in negated sets, where the negation seem to apply for the first shorthand literal only; the rest is taken positively >>> regex.findall(r"[^\d-h]", "a^b12c-h") ['-', 'h'] cf. also a simplified pattern, where re seems to work correctly: >>> regex.findall(r"[^\dh]", "a^b12c-h") ['h'] >>> re.findall(r"[^\dh]", "a^b12c-h") ['a', '^', 'b', 'c', '-'] >>> or maybe regardless the order - in presence of shorthand literals and normal characters in negated sets, these normal characters are matched positively >>> regex.findall(r"[^h\s\db]", "a^b 12c-h") ['b', 'h'] >>> re.findall(r"[^h\s\db]", "a^b 12c-h") ['a', '^', 'c', '-'] >>> also related to character sets but possibly different - maybe adding a (reduntant) character also belonging to the shorthand in a negated set seem to somehow confuse the parser: regex.findall(r"[^b\w]", "a b") [] re.findall(r"[^b\w]", "a b") [' '] regex.findall(r"[^b\S]", "a b") [] re.findall(r"[^b\S]", "a b") [' '] >>> regex.findall(r"[^8\d]", "a 1b2") [] >>> re.findall(r"[^8\d]", "a 1b2") ['a', ' ', 'b'] >>> I didn't find any relevant tracker issues, sorry if I missed some... I initially wanted to provide test code additions, but as I am not sure about the intended output in all cases, I am leaving it in this form; vbr ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue2636> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com