[EMAIL PROTECTED] wrote: > Is there a solution here that I'm missing? What am I doing that is so > inefficient? >
Hi Jeff, Yes, it seems you have plenty of performance leaks. Please see my notes below. > def massreplace(): > editfile = open("pathname\editfile.txt") > filestring = editfile.read() > filelist = filestring.splitlines() > ## errorcheck = re.compile('(a name=)+(.*)(-)+(.*)(></a>)+') > for i in range(len(filelist)): > source = open(filelist[i]) > > Read this post: http://mail.python.org/pipermail/python-list/2004-August/275319.html Instead of reading the whole document, storing it in a variable, splitting it and the iterating, you could simply do: def massreplace(): editfile = open("pathname\editfile.txt") for source in editfile: > starttext = source.read() > interimtext = replacecycle(starttext) > (...) > Excuse me, but this is insane. Do just one call (or none at all, I don't see why you need to split this into two functions) and let the function manage the replacement "layers". I'm skipping the next part (don't want to understand all your logic now). > (...) > > def replacecycle(starttext): > Unneeded, IMHO. > p1= re.compile('(href=|HREF=)+(.*)(#)+(.*)( )+(.*)(">)+') > (...) > interimtext = p100.sub(q2, interimtext) > Same euphemism applies here. I might be wrong, but I'm pretty confident you can make all this in one simple regex. Anyway, although regexes are supposed to be cached, don't need to define them every time the function gets called. Do it once, outside the function. At the very least you save one of the most important performance hits in python, function calls. Read this: http://wiki.python.org/moin/PythonSpeed/PerformanceTips Also, if you are parsing HTML consider using BeautifulSoup or ElementTree, or something (particularly if you don't feel particularly confident with regexes). Hope you find this helpful. Pablo -- http://mail.python.org/mailman/listinfo/python-list