[EMAIL PROTECTED] (Alex Martelli) writes: > > That depends on how much ram you have. You could try a billion items. > Let's see you try it
If you want me to try timing it with a billion items on your computer, you'd have to set up a suitable account and open a network connection, etc., probably not worth the trouble. Based on examining StringIO.py, on my current computer (512MB ram), with 100 million items, it looks like using a bunch of writes interspersed with seeks will be much faster than just using writes. I wouldn't have guessed THAT. With a billion items it will thrash no matter what. > > I'd have used the array module. > ...and would that support plain byte strings and Unicode smoothly and Actually, I see that getvalue is supposed to raise an error if you mix unicode with 8-bit ascii: The StringIO object can accept either Unicode or 8-bit strings, but mixing the two may take some care. If both are used, 8-bit strings that cannot be interpreted as 7-bit ASCII (that use the 8th bit) will cause a UnicodeError to be raised when getvalue() is called. This is another surprise, I'd have thought it could just convert to unicode as soon as it saw a unicode string. I think I understand the idea. The result is that with StringIO (Python 2.4.1), s = StringIO() # ok s.write('\xc3') # ok s.write(u'a') # ok s.seek(0,2) # raises UnicodeDecodeError Raising the error at the second s.write doesn't seem like a big problem. The StringIO doc already doesn't mention that seek can raise a Unicode exception, so it needs to be fixed either way. > > the whole ''.join idiom revolves around the progrmamer knowing > > some undocumented behavior of the implementation... > > No more than StringIO.write "revolves around" the programmer knowing > exactly the same thing about the optimizations in StringIO: semantics > are guaranteed, performance characteristics are not. I think having either one use quadratic time is bogus (something like n log n might be ok). > > reliance on undocumented behavior seems totally bogus to me, but if > > So I assume you won't be using StringIO.write any more, nor ANY other > way to join sequences of strings? Because the performance of ALL of > them depend on such "undocumented behavior". I'll keep using them but it means that the program's complexity (i.e. that it's O(n) and not O(n**2)) depends on the interpreter implementation, which is bogus. Do you really want a language designed by computer science geniuses to be so underspecified that there's no way to tell whether a straightforward program's running time is linear or quadratic? > Besides C++'s standard library, very few languages like to pin > themselves down by ensuring any performance guarantee;-). I seem to remember Scheme guarantees tail recursion optimization. This is along the same lines. It's one thing for the docs to not want to promise that .sort() uses at most 3.827*n*(lg(n)+14.7) comparisons or something like that. That's what I'd consider to be pinning down a performance guarantee. Promising that .sort() is O(n log n) just says that the implementation is reasonable. The C library's "qsort" doc even specifies the precise algorithm, or used to. Python's heapq module doc similarly specifies heapq's algorithm. Even that gets far afield. The real objection here (about ''.join) is that every Python user is expected to learn a weird, pervasive idiom, but the reason for the idiom cannot be deduced from the language reference. That is just bizarre. > As for performance guarantees, I don't think we have them now even for > list.append, list.extend, dict.__getitem__, and other similarly > fundamental methods. I assume any such guarantees would have to > weaselword regarding costs of memory allocation... I think it's enough to state the amortized complexity of these operations to within a factor of O(log(N)). That should be easy enough to do with the standard implementations (dicts using hashing, etc) while still leaving the implementation pretty flexible. That should allow determining the running speed of a user's program to within a factor of O(log(N)), a huge improvement over not being able to prove anything about it. Even without such guarantees it's enough to say that these operations work in the obvious ways ("dicts use hashing..."), and even without saying that, relying on the behavior isn't so terrible, because the code that you write is the obvious code for using operations that work in the obvious ways. That's not the case for ''.join, the use of which is not obvious at all. I do see docs for the built-in hash function and __hash__ method http://docs.python.org/lib/built-in-funcs.html#l2h-34 http://docs.python.org/ref/customization.html#l2h-195 indicating that dictionary lookup uses hashing. -- http://mail.python.org/mailman/listinfo/python-list