On Jan 13, 7:24 pm, "Barak, Ron" <ron.ba...@lsi.com> wrote: > Hi, > > I have a question about relative performance of comparable regular > expressions. > > I have large log files that start with three letters month names > (non-unicode). > > Which would give better performance, matching with "^[a-zA-Z]{3}", or with > "^\S{3}" ?
(1) If you want to match at the start of a line, use re.match() *without* the pointless "^". Don't use re.search with a pattern starting with "^" -- it won't be any faster than and it could be a lot worse; re.search doesn't know to stop if the first match fails: command-prompt>\python26\python -m timeit -s"import re;rx=re.compile ('^AB') ;text='Z'*100" "rx.match(text)" 1000000 loops, best of 3: 1.15 usec per loop command-prompt>\python26\python -m timeit -s"import re;rx=re.compile ('^AB') ;text='Z'*100" "rx.search(text)" 100000 loops, best of 3: 4.47 usec per loop command-prompt>\python26\python -m timeit -s"import re;rx=re.compile ('^AB') ;text='Z'*1000" "rx.search(text)" 10000 loops, best of 3: 34.1 usec per loop (2) I think you mean "^\s{3}" not "^\S{3}" (3) Now that you've seen how to do timings, over to you :-) > Also, which is better (if different at all): "\d\d" or "\d{2}" ? > Also, would matching "." be different (performance-wise) than matching the > actual character, e.g. matching ":" ? > And lastly, at the end of a line, is there any performance difference between > "(.+)$" and "(.+)" Cheers, John -- http://mail.python.org/mailman/listinfo/python-list