On Jan 11, 2010, at 3:30 PM, Jeremy wrote:
On Jan 11, 1:15 pm, "Diez B. Roggisch" <de...@nospam.web.de> wrote:
Jeremy schrieb:
On Jan 11, 12:54 pm, Carl Banks <pavlovevide...@gmail.com> wrote:
On Jan 11, 11:20 am, Jeremy <jlcon...@gmail.com> wrote:
I just profiled one of my Python scripts and discovered that
>99% of
the time was spent in
{built-in method sub}
What is this function and is there a way to optimize it?
I'm guessing this is re.sub (or, more likely, a method sub of an
internal object that is called by re.sub).
If all your script does is to make a bunch of regexp substitutions,
then spending 99% of the time in this function might be reasonable.
Optimize your regexps to improve performance. (We can help you
if you
care to share any.)
If my guess is wrong, you'll have to be more specific about what
your
sctipt does, and maybe share the profile printout or something.
Carl Banks
Your guess is correct. I had forgotten that I was using that
function.
I am using the re.sub command to remove trailing whitespace from
lines
in a text file. The commands I use are copied below. If you have
any
suggestions on how they could be improved, I would love to know.
Thanks,
Jeremy
lines = self._outfile.readlines()
self._outfile.close()
line = string.join(lines)
if self.removeWS:
# Remove trailing white space on each line
trailingPattern = '(\S*)\ +?\n'
line = re.sub(trailingPattern, '\\1\n', line)
line = line.rstrip()?
Diez
Yep. I was trying to reinvent the wheel. I just remove the trailing
whitespace before joining the lines.
I second the suggestion to use rstrip(), but for future reference you
should also check out the compile() function in the re module. You
might want to time the code above against a version using a compiled
regex to see how much difference it makes.
Cheers
Philip
--
http://mail.python.org/mailman/listinfo/python-list