On Jan 12, 7:30 am, Jeremy <jlcon...@gmail.com> wrote: > On Jan 11, 1:15 pm, "Diez B. Roggisch" <de...@nospam.web.de> wrote: > > > > > Jeremy schrieb: > > > > On Jan 11, 12:54 pm, Carl Banks <pavlovevide...@gmail.com> wrote: > > >> On Jan 11, 11:20 am, Jeremy <jlcon...@gmail.com> wrote: > > > >>> I just profiled one of my Python scripts and discovered that >99% of > > >>> the time was spent in > > >>> {built-in method sub} > > >>> What is this function and is there a way to optimize it? > > >> I'm guessing this is re.sub (or, more likely, a method sub of an > > >> internal object that is called by re.sub). > > > >> If all your script does is to make a bunch of regexp substitutions, > > >> then spending 99% of the time in this function might be reasonable. > > >> Optimize your regexps to improve performance. (We can help you if you > > >> care to share any.) > > > >> If my guess is wrong, you'll have to be more specific about what your > > >> sctipt does, and maybe share the profile printout or something. > > > >> Carl Banks > > > > Your guess is correct. I had forgotten that I was using that > > > function. > > > > I am using the re.sub command to remove trailing whitespace from lines > > > in a text file. The commands I use are copied below. If you have any > > > suggestions on how they could be improved, I would love to know. > > > > Thanks, > > > Jeremy > > > > lines = self._outfile.readlines() > > > self._outfile.close() > > > > line = string.join(lines) > > > > if self.removeWS: > > > # Remove trailing white space on each line > > > trailingPattern = '(\S*)\ +?\n' > > > line = re.sub(trailingPattern, '\\1\n', line) > > > line = line.rstrip()? > > > Diez > > Yep. I was trying to reinvent the wheel. I just remove the trailing > whitespace before joining the lines.
Actually you don't do that. Your regex has three components: (1) (\S*) zero or more occurrences of not-whitespace (2) \ +? one or more (non-greedy) occurrences of SPACE (3) \n a newline Component (2) should be \s+? In any case this is a round-about way of doing it. Try writing a regex that does it simply: replace trailing whitespace by an empty string. Another problem with your approach: it doesn't work if the line is not terminated by \n -- this is quite possible if the lines are being read from a file. A wise person once said: Re-inventing the wheel is often accompanied by forgetting to re-invent the axle. -- http://mail.python.org/mailman/listinfo/python-list