[issue30340] Optimize out non-capturing groups

Serhiy Storchaka Thu, 11 May 2017 02:21:27 -0700

New submission from Serhiy Storchaka:

Proposed patch makes the regular expression parser produce more optimal tree, 
mainly due to getting rid of non-capturing groups. This allows to apply an 
optimization that was forbidden before and makes the regular expression 
compiler producing more efficient code.


For example following expressions are transformed in more optimal form:

'(?:x|y)+' -> '[xy]+'
'(?:ab)|(?:ac)' -> 'a[bc]'
r'[a-z]|\d' -> r'[a-z\d]'

This can speed up matching by 10-25 times.

$ ./python -m timeit -s "import re; p = re.compile(r'(?:x|y)+'); s = 'x'*10000" 
 "p.match(s)"
Unpatched:  500 loops, best of 5: 865 usec per loop
Patched:    5000 loops, best of 5: 84.5 usec per loop

$ ./python -m timeit -s "import re; p = re.compile(r'(?:[a-z]|\d)+'); s = 
'x'*10000"  "p.match(s)"
Unpatched:  100 loops, best of 5: 2.19 msec per loop
Patched:    5000 loops, best of 5: 84.5 usec per loop

----------
assignee: serhiy.storchaka
components: Library (Lib), Regular Expressions
messages: 293477
nosy: ezio.melotti, mrabarnett, serhiy.storchaka
priority: normal
severity: normal
stage: patch review
status: open
title: Optimize out non-capturing groups
type: performance
versions: Python 3.7

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30340>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue30340] Optimize out non-capturing groups

Reply via email to