Re: Regular Expressions - Python vs Perl

Roy Smith Fri, 22 Apr 2005 05:40:04 -0700

[EMAIL PROTECTED] (Ilpo Nyyssönen) wrote:
> Of course it caches those when running. The point is that it needs to
> recompile every time you have restarted the program. With short lived
> command line programs this really can be a problem.


Are you speculating that it might be a problem, or saying that you have 
seen it be a problem in a real-life program?

I just generated a bunch of moderately simple regexes from a dictionary 
wordlist.  Looks something like:

Roy-Smiths-Computer:play$ head exps
a.*a[0-9]{34}
a.*ah[0-9]{34}
a.*ahed[0-9]{34}
a.*ahing[0-9]{34}
a.*ahs[0-9]{34}
a.*al[0-9]{34}
a.*alii[0-9]{34}
a.*aliis[0-9]{34}
a.*als[0-9]{34}
a.*ardvark[0-9]{34}

Then I ran them through a little script that does:

for exp in sys.stdin.readlines():
    regex = re.compile (exp)

and timed it for various numbers of lines.  On my G4 Powerbook (1 GHz 
PowerPC), I'm compiling about 1000 regex's per second:

Roy-Smiths-Computer:play$ time head -5000 < exps | ./regex.py

real    0m5.208s
user    0m4.690s
sys     0m0.090s

So, my guess is that unless you're compiling 100's of regexes each time you 
start up, the one-time compilation costs are probably not significant.

> And yes, I have read the source of sre.py and I have made an ugly
> module that digs the compiled data and pickles it to a file and then
> in next startup it reads that file and puts the stuff back to the
> cache.

That's exactly what I would have done if I really needed to improve startup 
speed.  In fact, I did something like that many moons ago, in a previous 
life.  See R. Smith, "A finite state machine algorithm for finding 
restriction sites and other pattern matching applications", CABIOS, Vol 4, 
no. 4, 1988.  In that case, I had about 1200 patterns I was searching for 
(and doing it on hardware running about 1% of the speed of my current 
laptop).

BTW, why did you have to dig out the compiled data before pickling it?  
Could you not have just pickled whatever re.compile() returned?
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expressions - Python vs Perl

Reply via email to