New submission from Serhiy Storchaka:

sre_parse.parse_template uses string concatenation to accumulate characters.

    def literal(literal, p=p, pappend=a):
        if p and p[-1][0] is LITERAL:
            p[-1] = LITERAL, p[-1][1] + literal
        else:
            pappend((LITERAL, literal))

This operation have quadratic calculation complexity for long replacement 
strings.

$ ./python -m timeit -n1 -r1 -s "from sre_parse import parse_template; repl = 
'x'*100000"  "parse_template(repl, '')"
1 loops, best of 1: 3.38 sec per loop
$ ./python -m timeit -n1 -r1 -s "from sre_parse import parse_template; repl = 
'x'*200000"  "parse_template(repl, '')"
1 loops, best of 1: 18.2 sec per loop

The proposed patch change amortized complexity to be linear. It also speeds up 
parsing shorter strings.

$ ./python -m timeit -n1 -r1 -s "from sre_parse import parse_template; repl = 
'x'*100000"  "parse_template(repl, '')"
1 loops, best of 1: 915 msec per loop
$ ./python -m timeit -n1 -r1 -s "from sre_parse import parse_template; repl = 
'x'*200000"  "parse_template(repl, '')"
1 loops, best of 1: 1.79 sec per loop

----------
components: Library (Lib), Regular Expressions
files: re_parse_template.patch
keywords: patch
messages: 201032
nosy: ezio.melotti, mrabarnett, pitrou, serhiy.storchaka
priority: normal
severity: normal
stage: patch review
status: open
title: Quadratic complexity in the parsing of re replacement string
type: performance
versions: Python 2.7, Python 3.3, Python 3.4
Added file: http://bugs.python.org/file32315/re_parse_template.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue19365>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to