On Jan 11, 2010, at 3:30 PM, Jeremy wrote:

On Jan 11, 1:15 pm, "Diez B. Roggisch" <de...@nospam.web.de> wrote:
Jeremy schrieb:

On Jan 11, 12:54 pm, Carl Banks <pavlovevide...@gmail.com> wrote:
On Jan 11, 11:20 am, Jeremy <jlcon...@gmail.com> wrote:

I just profiled one of my Python scripts and discovered that >99% of
the time was spent in
{built-in method sub}
What is this function and is there a way to optimize it?
I'm guessing this is re.sub (or, more likely, a method sub of an
internal object that is called by re.sub).

If all your script does is to make a bunch of regexp substitutions,
then spending 99% of the time in this function might be reasonable.
Optimize your regexps to improve performance. (We can help you if you
care to share any.)

If my guess is wrong, you'll have to be more specific about what your
sctipt does, and maybe share the profile printout or something.

Carl Banks

Your guess is correct.  I had forgotten that I was using that
function.

I am using the re.sub command to remove trailing whitespace from lines in a text file. The commands I use are copied below. If you have any
suggestions on how they could be improved, I would love to know.

Thanks,
Jeremy

lines = self._outfile.readlines()
self._outfile.close()

line = string.join(lines)

if self.removeWS:
    # Remove trailing white space on each line
    trailingPattern = '(\S*)\ +?\n'
    line = re.sub(trailingPattern, '\\1\n', line)

line = line.rstrip()?

Diez

Yep.  I was trying to reinvent the wheel.  I just remove the trailing
whitespace before joining the lines.

I second the suggestion to use rstrip(), but for future reference you should also check out the compile() function in the re module. You might want to time the code above against a version using a compiled regex to see how much difference it makes.

Cheers
Philip



--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to