On Sep 29, 5:22 pm, [EMAIL PROTECTED] wrote: > I wrote the following simple program to loop through our help files > and fix some errors (in case you can't see the subtle RE search that's > happening, we're replacing spaces in bookmarks with _'s) > > the program works great except for one thing. It's significantly > slower through the later files in the search then through the early > ones... Before anyone criticizes, I recognize that that middle section > could be simplified with a for loop... I just haven't cleaned it > up... > > The problem is that the first 300 files take about 10-15 seconds and > the last 300 take about 2 minutes... If we do more than about 1500 > files in one run, it just hangs up and never finishes... > > Is there a solution here that I'm missing? What am I doing that is so > inefficient?
Ugh, that was entirely too many regexps for my taste :-) How about something like: def attr_ndx_iter(txt, attribute): "Return all the start and end indices for the values of attribute." txt = txt.lower() attribute = attribute.lower() + '=' alen = len(attribute) chunks = txt.split(attribute) if len(chunks) == 1: return start = len(chunks[0]) + alen end = -1 for chunk in chunks[1:]: qchar = chunk[0] end = start + chunk.index(qchar, 1) yield start + 1, end start += len(chunk) + alen def substr_map(txt, indices, fn): "Apply fn to text within indices." res = [] cur = 0 for i,j in indices: res.append(txt[cur:i]) res.append(fn(txt[i:j])) cur = j res.append(txt[cur:]) return ''.join(res) def transform(s): "The transformation to do on the attribute values." return s.replace(' ', '_') def zap_spaces(txt, *attributes): for attr in attributes: txt = substr_map(txt, attr_ndx_iter(txt, attr), transform) return txt def mass_replace(): import sys w = sys.stdout.write for f in open(r'pathname\editfile.txt'): try: open(f, 'w').write(zap_spaces(open(f).read(), 'href', 'name')) w('.') # progress-meter :-) except: print 'Error processing file:', f minimally-tested'ly y'rs -- bjorn -- http://mail.python.org/mailman/listinfo/python-list