On 04/12/10 08:43, andrew cooke wrote: > > Is the third case here surprising to anyone else? It doesn't make > sense to me... > > Python 2.6.2 (r262:71600, Oct 24 2009, 03:15:21) > [GCC 4.4.1 [gcc-4_4-branch revision 150839]] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> from re import compile >>>> p1 = compile('a\x62c') >>>> p1.match('abc') > <_sre.SRE_Match object at 0x7f4e8f93d578> >>>> p2 = compile('a\\x62c') >>>> p2.match('abc') > <_sre.SRE_Match object at 0x7f4e8f93d920> >>>> p3 = compile('a\\\x62c') >>>> p3.match('a\\bc') >>>> p3.match('abc') >>>> p3.match('a\\\x62c') >>>>
It isn't so much about regex but about string: Python 2.6.4 (r264:75706, Mar 18 2010, 01:03:14) [GCC 4.3.4] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> print 'a\x62c' abc >>> print 'a\\x62c' a\x62c >>> print 'a\\\x62c' a\bc In the first case, *python* will unescape the string literal '\x62' into letters 'b'. In the second case, python will unescape the double backslash '\\' into a single slash '\' and *regex* will unescape the single-slash-62 into 'b'. In the third case, *python* will unescape double backslash '\\' into single-slash '\' and byte-string-62 '\x62' to letter-b 'b', and regex received it as 'a\bc', which interpreted as a special character to regex: """ \b Matches the empty string, but only at the start or end of a word. """ -- http://mail.python.org/mailman/listinfo/python-list