On Apr 6, 11:48 pm, Hyunchul Kim <sun...@sfc.keio.ac.jp> wrote: > Hi, all > > I have a simple script. > Can you improve algorithm of following 10 line script, with a view point > of speed ? > Following script do exactly what I want but I want to improve the speed.
So the first thing to do is to try to work out where it is spending the time. Step 1: read your code: uh-oh: re.compile('^a complex regular expression here$) By the way, you are using match() so the "^" is redundant. Step 2: Run some timing tests: (a) script as written (b) script modified to do nothing else but count the lines in the file. Is the time difference worth any further work? If so: (c) script modified to ignore the regex and pretend the match fails on each line (d) script modified to ignore the regex and pretend the match succeeds on each line (e) script modified to ignore the appending and yielding; just count how many matches So where is most of the time(a) - time(b) difference being taken: matching or appending/yielding? > > This parse a file and accumulate lines till a line match a given regular > expression. > Then, when a line match a given regular expression, this function yield > lines before the matched lines. I think your code documents the requirement slightly better. Before you start worrying about speed, it's a good idea to ensure that the requirements are correct. It would be useful to include explicit statements to this effect: Given N matching lines, there will be (N+1) yields. The last yield will contain the lines (if any) after the last matching line (if any). No other yield will be empty. Consequently: an empty file will yield [], and a file with no matching lines will yield [all the lines in the file]. Presuming of course that is what you want; consider this: first yield: [line1, line2] second and final yield: [line3, line4] The caller has no way of knowing (without using the regex!) if the last line in the file matches the regex or not. Does this matter? > > **************** > import re > resultlist = [] > cp_regularexpression = re.compile('^a complex regular expression here$) > for line in file(inputfile): > if cp_regularexpression.match(line): > if resultlist != []: if resultlist: # slightly faster and much less ugly > yield resultlist > resultlist = [] > resultlist.append(line) > yield resultlist If it turns out that you don't want lines after the last matching line, you need to remove the above yield statement. HTH, John -- http://mail.python.org/mailman/listinfo/python-list