The regular expression HOWTO (http://docs.python.org/howto/regex.html#more-metacharacters) explains the following

# ------------------------------
zero-width assertions should never be repeated, because if they match once at a given location, they can obviously be matched an infinite number of times.
# ------------------------------


Why the wording is "should never" ? Repeating a zero-width assertion is not forbidden, for instance :

>>> import re
>>> re.compile("\\b\\b\w+\\b\\b")
<_sre.SRE_Pattern object at 0xb7831140>
>>>

Nevertheless, the following doesn't execute :

>>> re.compile("\\b{2}\w+\\b\\b")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/re.py", line 190, in compile
    return _compile(pattern, flags)
  File "/usr/lib/python2.7/re.py", line 245, in _compile
    raise error, v # invalid expression
sre_constants.error: nothing to repeat
>>>


\\b\\b and \\b{2} aren't equivalent ?


Surprisingly, the engine doesn't optimize repeated boundary assertions, for instance

# ------------------------------
import re
import time

a=time.clock()
len("\\b\\b\\b"*100000+"\w+")
b=time.clock()
print "CPU time : %.2f s" %(b - a)

a=time.clock()
re.compile("\\b\\b\\b"*100000+"\w+")
b=time.clock()
print "CPU time : %.2f s" %(b - a)
# ------------------------------

outputs:

# ------------------------------
CPU time : 0.00 s
CPU time : 1.33 s
# ------------------------------


Your comments are welcome!
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to