John Machin wrote: > On Jan 13, 7:24 pm, "Barak, Ron" <ron.ba...@lsi.com> wrote: >> Hi, >> >> I have a question about relative performance of comparable regular >> expressions. >> >> I have large log files that start with three letters month names >> (non-unicode). >> >> Which would give better performance, matching with "^[a-zA-Z]{3}", or with >> "^\S{3}" ? > > (1) If you want to match at the start of a line, use re.match() > *without* the pointless "^". Don't use re.search with a pattern > starting with "^" -- it won't be any faster than and it could be a lot > worse; re.search doesn't know to stop if the first match fails: > > command-prompt>\python26\python -m timeit -s"import re;rx=re.compile > ('^AB') > ;text='Z'*100" "rx.match(text)" > 1000000 loops, best of 3: 1.15 usec per loop > > command-prompt>\python26\python -m timeit -s"import re;rx=re.compile > ('^AB') > ;text='Z'*100" "rx.search(text)" > 100000 loops, best of 3: 4.47 usec per loop > > command-prompt>\python26\python -m timeit -s"import re;rx=re.compile > ('^AB') > ;text='Z'*1000" "rx.search(text)" > 10000 loops, best of 3: 34.1 usec per loop > > (2) I think you mean "^\s{3}" not "^\S{3}" > > (3) Now that you've seen how to do timings, over to you :-) > >> Also, which is better (if different at all): "\d\d" or "\d{2}" ? >> Also, would matching "." be different (performance-wise) than matching the >> actual character, e.g. matching ":" ? >> And lastly, at the end of a line, is there any performance difference >> between "(.+)$" and "(.+)" > Of course if the log strings all begin with a string like "Dec 12 2009 ...." then you don't need regular expressions at all - just pull the characters out using their positions and slicing. The month would be string[0:3] and so on.
regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ -- http://mail.python.org/mailman/listinfo/python-list