* Johannes Bauer wrote: > The pre-check version is about 42% faster in my case (0.75 sec vs. 1.3 > sec). Curious. This is Python 3.2.3 on Linux x86_64.
A lot of time is spent with dict lookups (timings at my box, Python 3.2.3) in your inner loop (1500000 times...) #!/usr/bin/python3 import re pattern = re.compile(r'ENQUEUEING: /listen/(.*)') count = 0 for line in open('error.log'): #if 'ENQ' not in line: # continue m = pattern.search(line) if m: count += 1 print(count) runs ~ 1.39 s replacing some dict lookups with index lookups: #!/usr/bin/python3 def main(): import re pattern = re.compile(r'ENQUEUEING: /listen/(.*)') count = 0 for line in open('error.log'): #if 'ENQ' not in line: # continue m = pattern.search(line) if m: count += 1 print(count) main() runs ~ 1.15s and further: #!/usr/bin/python3 def main(): import re pattern = re.compile(r'ENQUEUEING: /listen/(.*)') search = pattern.search count = 0 for line in open('error.log'): #if 'ENQ' not in line: # continue m = search(line) if m: count += 1 print(count) main() runs ~ 1.08 s and for reference: #!/usr/bin/python3 def main(): import re pattern = re.compile(r'ENQUEUEING: /listen/(.*)') search = pattern.search count = 0 for line in open('error.log'): if 'ENQ' not in line: continue m = search(line) if m: count += 1 print(count) main() runs ~ 0.71 s The final difference is probably just the difference between a hardcoded string search and a generic NFA. nd -- http://mail.python.org/mailman/listinfo/python-list