Serhiy Storchaka added the comment:

Here is a patch that adds more optimizations for searching patterns that starts 
with a literal string and groups. In particular it includes a case when a 
pattern starts with a group containing single character.
Examples:

$ ./python -m timeit -s "import re; p = re.compile('(\n)'); s = ('a'*100 + 
'\n')*1000" -- "p.split(s)"
Unpatched: 100 loops, best of 3: 4.58 msec per loop
Patched  : 1000 loops, best of 3: 562 usec per loop

$ ./python -m timeit -s "import re; p = re.compile('(\n\r)'); s = ('a'*100 + 
'\n\r')*1000" -- "p.split(s)"
Unpatched: 100 loops, best of 3: 3.1 msec per loop
Patched  : 1000 loops, best of 3: 663 usec per loop

For comparison:

$ ./python -m timeit -s "import re; p = re.compile('\n'); s = ('a'*100 + 
'\n')*1000" -- "p.split(s)"
1000 loops, best of 3: 329 usec per loop
$ ./python -m timeit -s "import re; p = re.compile('\n\r'); s = ('a'*100 + 
'\n\r')*1000" -- "p.split(s)"
1000 loops, best of 3: 338 usec per loop

Optimized also more complex but rare cases, such as '\n()\r' or '((\n)(\r))'.

Fast searching no longer can be disabled.

----------
assignee:  -> serhiy.storchaka
keywords: +patch
stage:  -> patch review
versions: +Python 3.6 -Python 2.7, Python 3.4
Added file: http://bugs.python.org/file39684/re_literal_prefix_with_groups.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue24426>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to