MackS wrote: > This program is meant to process relatively long strings (10-20 MB) by > selectively modifying small chunks one at a time.
... > Can I get over this performance problem without reimplementing the > whole thing using a barebones list object? Is that a problem? Are you using string methods? Some possibilities are: array.array('c') UserString.MutableString cStringIO depending on exactly what you are trying to do, but if it is just straight forward replacement, (i.e. you never actually change the length of the string) I'd guess the list idiom would be simplest. If you have enough memory, just keep two copies of the string, the original and the modified: size = 1024*1024*20 # 20 MB original = "A" * size copy = [None] * size for i in range(size): copy[i] = original[i].lower() copy = ''.join(copy) This takes 530 seconds (8 minutes) on my not-especially-powerful system. Ouch. How does that compare to your code? If you can operate on moderately sized chunks of text at a time rather than one character at a time, you'll see some performance increase. Also the above code not only has a 20MB string, plus a 20MB list, it also carries around a list of 20+ million int objects. If I was paying more attention before I hit the enter key, I would have used xrange instead of range and saved a lot of memory. It also has to assemble the new 20MB string before garbage-collecting the character list. It is possible that the above test code has well in excess of 100MB of data at its peak. With those problems in mind, I try this: from __future__ import division size = 1024*1024*20 # 20 MB original = "A" * size copy = [] blocksize = 1024 for i in xrange(size//blocksize): copy.append( \ original[i*blocksize:(i+1)*blocksize].lower() ) copy = ''.join(copy) This makes a BIG difference: from 8 minutes down to 0.5 seconds. That's a speed-up by a factor of about one thousand. I was so amazed at the speed increase I re-wrote the code with a couple of simple tests, then ran it again. Same result. Unless I've made some stupid mistake, I think this is the way to go. -- Steven. -- http://mail.python.org/mailman/listinfo/python-list