[disclaimer - this is just guessing from general knowledge of regular expressions; i don't know any details of python's regexp engine]
if your regular expression is the bottleneck rewrite it to avoid lazy matching, references, groups, lookbacks, and perhaps even counted repeats. with a little thought you can do almost everything using just choices '(a|b)' and repeat 'a*'. even if the expression is longer, it will probably be faster. character ranges - either explicit '[a-z]' or predefined '\w' (even '.') - should be fine, but try to avoid having multiple occurrences of ".*". see the timeit package for testing the speed of small chunks of code. andrew Hyunchul Kim wrote: > Hi, all > > I have a simple script. > Can you improve algorithm of following 10 line script, with a view point > of speed ? > Following script do exactly what I want but I want to improve the speed. > > This parse a file and accumulate lines till a line match a given regular > expression. > Then, when a line match a given regular expression, this function yield > lines before the matched lines. > > **************** > import re > resultlist = [] > cp_regularexpression = re.compile('^a complex regular expression here$) > for line in file(inputfile): > if cp_regularexpression.match(line): > if resultlist != []: > yield resultlist > resultlist = [] > resultlist.append(line) > yield resultlist > **************** > > Thank you in advance, > > Hyunchul > > > -- > http://mail.python.org/mailman/listinfo/python-list > > -- http://mail.python.org/mailman/listinfo/python-list