New submission from Barry A. Warsaw:

It's a very common pattern to see the following at module scope:

cre_a = re.compile('some pattern')
cre_b = re.compile('other pattern')

and so on.  This can cost you at start up time because all those regular 
expressions are compiled at import time, even if they're never used in practice 
(e.g. because say whatever condition tickles the compiled regex never gets 
exercised).

It occurred to me that if re.compile() deferred compilation of the regexp until 
first use, you could speed up start up time.  But by how much?  And at what 
cost?

So I ran a small experiment (pull request to be submitted) using the `perf` 
module on `pip --help`.  I was able to cut down the number of compiles from 28 
to 9, and a mean startup time from 245ms to 213ms.

% python -m perf compare_to ../base.json ../defer.json 
Mean +- std dev: [base] 245 ms +- 19 ms -> [defer] 213 ms +- 21 ms: 1.15x 
faster (-13%)

`pip install tox` reduces the compiles from 231 to 75:

(cpython 3.7) 231 0.06945133209228516
(3.7 w/defer)  75 0.03140091896057129

So what's the cost?  Backward compatibility.  `re.compile()` doesn't return a 
compiled regular expression object now, but instead a "deferred" proxy.  When 
the proxy is used, then it does the actual compilation.  This can break 
compatibility by deferring any exceptions that compile() might raise.  This 
happens a fair bit in the test suite, but I'm not sure it's all that common in 
practice.  In any case, I've also added a re.IMMEDIATE (re.N -- for "now") flag 
to force immediate compilation.

I also modified the compilation to use an actual functools.lru_cache.  This 
way, if maxcache gets triggered, the entire cache won't get blown away.

So, whether this is a good idea or not, I open this and push the branch for 
further discussion.

----------
assignee: barry
components: Library (Lib)
messages: 302995
nosy: barry
priority: normal
severity: normal
status: open
title: Defer compiling regular expressions
type: performance
versions: Python 3.7

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue31580>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to