Tony Nelson wrote: > >Can I get over this performance problem without reimplementing the > >whole thing using a barebones list object? I though I was being "smart" > >by avoiding editing the long list, but then it struck me that I am > >creating a second object of the same size when I put the modified > >shorter string in place... > > > A couple of minutes experimenting with array.array at the python command > line indicates that it will work fine for you. Quite snappy on a 16 MB > array, including a slice assignment of 1 KB near the beginning. > Array.array is probably better than lists for speed, and uses less > memory. It is the way to go if you are going to be randomly editing all > over the place but don't need to convert to string often.
I have no major objections to using array, but a minor one: ordinary lists may very well be more than snappy enough, and they have the advantage of being more familiar than the array module to many Python programmers. The time it takes to process a 20MB string will depend on the details of the processing, but my back of the envelope test using one large input string and an intermediate list of strings was *extremely* fast, less than half a second for a 20MB input. (See my earlier post for details.) Given that sort of speed, shifting to the less familiar array module just to shave the time from 0.49s to 0.45s is premature optimization. Although, in fairness, if you could cut the time to 0.04s for 20MB then it would be worth the extra work to use the array module. -- Steven. -- http://mail.python.org/mailman/listinfo/python-list