Re: Relative performance of comparable regular expressions

John Machin Tue, 13 Jan 2009 01:20:52 -0800

On Jan 13, 7:24 pm, "Barak, Ron" <[email protected]> wrote:
> Hi,
>
> I have a question about relative performance of comparable regular 
> expressions.
>
> I have large log files that start with three letters month names 
> (non-unicode).
>
> Which would give better performance, matching with  "^[a-zA-Z]{3}", or with 
> "^\S{3}" ?


(1) If you want to match at the start of a line, use re.match()
*without* the pointless "^". Don't use re.search with a pattern
starting with "^" -- it won't be any faster than and it could be a lot
worse; re.search doesn't know to stop if the first match fails:

command-prompt>\python26\python -m timeit -s"import re;rx=re.compile
('^AB')
;text='Z'*100" "rx.match(text)"
1000000 loops, best of 3: 1.15 usec per loop

command-prompt>\python26\python -m timeit -s"import re;rx=re.compile
('^AB')
;text='Z'*100" "rx.search(text)"
100000 loops, best of 3: 4.47 usec per loop

command-prompt>\python26\python -m timeit -s"import re;rx=re.compile
('^AB')
;text='Z'*1000" "rx.search(text)"
10000 loops, best of 3: 34.1 usec per loop

(2) I think you mean "^\s{3}" not "^\S{3}"

(3) Now that you've seen how to do timings, over to you :-)

> Also, which is better (if different at all): "\d\d" or "\d{2}" ?
> Also, would matching "." be different (performance-wise) than matching the 
> actual character, e.g. matching ":" ?
> And lastly, at the end of a line, is there any performance difference between 
> "(.+)$" and "(.+)"

Cheers,
John
--
http://mail.python.org/mailman/listinfo/python-list

Re: Relative performance of comparable regular expressions

Reply via email to