New submission from Serhiy Storchaka: sre_parse.parse_template uses string concatenation to accumulate characters.
def literal(literal, p=p, pappend=a): if p and p[-1][0] is LITERAL: p[-1] = LITERAL, p[-1][1] + literal else: pappend((LITERAL, literal)) This operation have quadratic calculation complexity for long replacement strings. $ ./python -m timeit -n1 -r1 -s "from sre_parse import parse_template; repl = 'x'*100000" "parse_template(repl, '')" 1 loops, best of 1: 3.38 sec per loop $ ./python -m timeit -n1 -r1 -s "from sre_parse import parse_template; repl = 'x'*200000" "parse_template(repl, '')" 1 loops, best of 1: 18.2 sec per loop The proposed patch change amortized complexity to be linear. It also speeds up parsing shorter strings. $ ./python -m timeit -n1 -r1 -s "from sre_parse import parse_template; repl = 'x'*100000" "parse_template(repl, '')" 1 loops, best of 1: 915 msec per loop $ ./python -m timeit -n1 -r1 -s "from sre_parse import parse_template; repl = 'x'*200000" "parse_template(repl, '')" 1 loops, best of 1: 1.79 sec per loop ---------- components: Library (Lib), Regular Expressions files: re_parse_template.patch keywords: patch messages: 201032 nosy: ezio.melotti, mrabarnett, pitrou, serhiy.storchaka priority: normal severity: normal stage: patch review status: open title: Quadratic complexity in the parsing of re replacement string type: performance versions: Python 2.7, Python 3.3, Python 3.4 Added file: http://bugs.python.org/file32315/re_parse_template.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue19365> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com