Hi Terry, -----Original Message----- From: Terry Reedy [mailto:tjre...@udel.edu] Sent: Wednesday, January 14, 2009 01:57 To: python-list@python.org Subject: Re: Could you suggest optimisations ?
Barak, Ron wrote: > Hi, > > In the attached script, the longest time is spent in the following > functions (verified by psyco log): I cannot help but wonder why and if you really need all the rigamorole with file pointers, offsets, and tells instead of for line in open(...): do your processing. I'm building a database of the found events in the logs (those records between the first and last regexs in regex_array). The user should then be able to navigate among these events (among other functionality). This is why I need the tells and offsets, so I'd know the place in the logs where an event starts/ends. Bye, Ron. > > def match_generator(self,regex): > """ > Generate the next line of self.input_file that > matches regex. > """ > generator_ = self.line_generator() > while True: > self.file_pointer = self.input_file.tell() > if self.file_pointer != 0: > self.file_pointer -= 1 > if (self.file_pointer + 2) >= self.last_line_offset: > break > line_ = generator_.next() > print "%.2f%% \r" % (((self.last_line_offset - > self.input_file.tell()) / (self.last_line_offset * 1.0)) * 100.0), > if not line_: > break > else: > match_ = regex.match(line_) > groups_ = re.findall(regex,line_) > if match_: > yield line_.strip("\n"), groups_ > > def get_matching_records_by_regex_extremes(self,regex_array): > """ > Function will: > Find the record matching the first item of regex_array. > Will save all records until the last item of regex_array. > Will save the last line. > Will remember the position of the beginning of the next line in > self.input_file. > """ > start_regex = regex_array[0] > end_regex = regex_array[len(regex_array) - 1] > > all_recs = [] > generator_ = self.match_generator > > try: > match_start,groups_ = generator_(start_regex).next() > except StopIteration: > return(None) > > if match_start != None: > all_recs.append([match_start,groups_]) > > line_ = self.line_generator().next() > while line_: > match_ = end_regex.match(line_) > groups_ = re.findall(end_regex,line_) > if match_ != None: > all_recs.append([line_,groups_]) > return(all_recs) > else: > all_recs.append([line_,[]]) > line_ = self.line_generator().next() > > def line_generator(self): > """ > Generate the next line of self.input_file, and update > self.file_pointer to the beginning of that line. > """ > while self.input_file.tell() <= self.last_line_offset: > self.file_pointer = self.input_file.tell() > line_ = self.input_file.readline() > if not line_: > break > yield line_.strip("\n") > > I was trying to think of optimisations, so I could cut down on > processing time, but got no inspiration. > (I need the "print "%.2f%% \r" ..." line for user's feedback). > > Could you suggest any optimisations ? > Thanks, > Ron. > > > P.S.: Examples of processing times are: > > * 2m42.782s on two files with combined size of 792544 bytes > (no matches found). > * 28m39.497s on two files with combined size of 4139320 bytes > (783 matches found). > > These times are quite unacceptable, as a normal input to the program > would be ten files with combined size of ~17MB. > > > ---------------------------------------------------------------------- > -- > > -- > http://mail.python.org/mailman/listinfo/python-list
-- http://mail.python.org/mailman/listinfo/python-list