> Put simply, it doesn't occur often enough to be worth it. The cost > outweighs the potential benefit.
I don't buy it. You could backtrack instead of failing for \b+ and \b*, and it would be almost as fast as this optimization. -- Devin On Tue, Jan 3, 2012 at 1:57 PM, MRAB <pyt...@mrabarnett.plus.com> wrote: > On 03/01/2012 09:45, Devin Jeanpierre wrote: >>> >>> \\b\\b and \\b{2} aren't equivalent ? >> >> >> This sounds suspiciously like a bug! >> >>> Why the wording is "should never" ? Repeating a zero-width assertion is >>> not >>> forbidden, for instance : >>> >>>>>> import re >>>>>> re.compile("\\b\\b\w+\\b\\b") >>> >>> <_sre.SRE_Pattern object at 0xb7831140> >>>>>> >>>>>> >> >> I believe this is meant to refer to arbitrary-length repetitions, such >> as r'\b*', not simple concatenations like that. r'\b*' will abort the >> whole match if is run on a boundary, because Python detects a >> repetition of a zero-width match and decides this is an error. >> > r"\b+" can be optimised to r"\b", but r"\b*" can be optimised to r"". > r"\b\b", r"\b\b\b", etc, can be optimised to r"\b". > > So why doesn't it optimised? > > Because every potential optimisation has a cost, which is the time it > would take to look for it. > > That cost needs to be balanced against the potential benefit. > > How often do you see repeated r"\b"? > > Put simply, it doesn't occur often enough to be worth it. The cost > outweighs the potential benefit. > -- > http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list