[EMAIL PROTECTED] (Ilpo Nyyssönen) wrote: > Of course it caches those when running. The point is that it needs to > recompile every time you have restarted the program. With short lived > command line programs this really can be a problem.
Are you speculating that it might be a problem, or saying that you have seen it be a problem in a real-life program? I just generated a bunch of moderately simple regexes from a dictionary wordlist. Looks something like: Roy-Smiths-Computer:play$ head exps a.*a[0-9]{34} a.*ah[0-9]{34} a.*ahed[0-9]{34} a.*ahing[0-9]{34} a.*ahs[0-9]{34} a.*al[0-9]{34} a.*alii[0-9]{34} a.*aliis[0-9]{34} a.*als[0-9]{34} a.*ardvark[0-9]{34} Then I ran them through a little script that does: for exp in sys.stdin.readlines(): regex = re.compile (exp) and timed it for various numbers of lines. On my G4 Powerbook (1 GHz PowerPC), I'm compiling about 1000 regex's per second: Roy-Smiths-Computer:play$ time head -5000 < exps | ./regex.py real 0m5.208s user 0m4.690s sys 0m0.090s So, my guess is that unless you're compiling 100's of regexes each time you start up, the one-time compilation costs are probably not significant. > And yes, I have read the source of sre.py and I have made an ugly > module that digs the compiled data and pickles it to a file and then > in next startup it reads that file and puts the stuff back to the > cache. That's exactly what I would have done if I really needed to improve startup speed. In fact, I did something like that many moons ago, in a previous life. See R. Smith, "A finite state machine algorithm for finding restriction sites and other pattern matching applications", CABIOS, Vol 4, no. 4, 1988. In that case, I had about 1200 patterns I was searching for (and doing it on hardware running about 1% of the speed of my current laptop). BTW, why did you have to dig out the compiled data before pickling it? Could you not have just pickled whatever re.compile() returned? -- http://mail.python.org/mailman/listinfo/python-list