Paul Rubin <http://[EMAIL PROTECTED]> wrote: ... > > Absolutely wrong: ''.join takes less for a million items than StringIO > > takes for 100,000. > > That depends on how much ram you have. You could try a billion items.
Let's see you try it -- I have better things to do than to trash around checking assertions which I believe are false and that you're too lazy to check yourself. > > After all, how do you think StringIO is implemented internally? A list > > of strings and a ''.join at the end are the best way that comes to mind, > > I'd have used the array module. ...and would that support plain byte strings and Unicode smoothly and polymorphically? You may recall a few posts ago expressing wonder at what optimizations cStringIO might have that stop it from doing just this... > > As for sum, you'll recall I was its original proponent, and my first > > implementation did specialcase strings (delegating right to ''.join). > > You could imagine a realy dumb implementation of ''.join that used > a quadratic algorithm, and in fact > > http://docs.python.org/lib/string-methods.html > > doesn't guarantee that join is linear. Therefore, the whole ''.join > idiom revolves around the progrmamer knowing some undocumented > behavior of the implementation (i.e. that ''.join is optimized). This No more than StringIO.write "revolves around" the programmer knowing exactly the same thing about the optimizations in StringIO: semantics are guaranteed, performance characteristics are not. > reliance on undocumented behavior seems totally bogus to me, but if So I assume you won't be using StringIO.write any more, nor ANY other way to join sequences of strings? Because the performance of ALL of them depend on such "undocumented behavior". Personally, I don't consider depending on "undocumented behavior" *for speed* to be bogus at all, particularly when there are no approaches whose performance characteristics ARE documented and guaranteed. Besides C++'s standard library, very few languages like to pin themselves down by ensuring any performance guarantee;-). > How making [].join(bunch_of_lists) analogous to ''.join, with a > documented guarantee that both are linear? I personally have no objection to adding a join method to lists or other sequences, but of course the semantics should be similar to: def join(self, *others): result = list() for other in others[:-1]: result.extend(other) result.extend(self) result.extend(others[-1]) return self.__class__(result) As for performance guarantees, I don't think we have them now even for list.append, list.extend, dict.__getitem__, and other similarly fundamental methods. I assume any such guarantees would have to weaselword regarding costs of memory allocation (including, possibly, garbage collection), since such allocation may of course be needed and its performance can easily be out of Python's control; and similarly, costs of iterating on the items of 'others', cost of indexing it, and so on (e.g.: for list.sort, cost of comparisons; for dict.__getitem__, cost of hash on the key; and so on, and so forth). I don't think it's worth my time doing weaselwording for this purpose, but if any sealawyers otherwise idle want to volunteer (starting with the existing methods of existing built-in types, I assume, rather than by adding others), the offer might be welcome on python-dev (I assume that large effort will have to be devoted to examining the actual performance characteristics of at least the reference implementation, in order to prove that the purported guarantees are indeed met). Alex -- http://mail.python.org/mailman/listinfo/python-list