On 2014-06-04 14:57, Marko Rauhamaa wrote: > > If you use UTF-8 for everything, then you end up in a world where > > string-indexing (see ChrisA's other side thread on this topic) is > > no longer an O(1) operation, but an O(N) operation. > > Most string operations are O(N) anyway. Besides, you could try and > be smart and keep a recent index cached so simple for loops would > be O(N) instead of O(N**2). So the idea of keeping strings > internally in UTF-8 might not be all that bad.
As mentioned elsewhere, I've got a LOT of code that expects that string indexing is O(1) and rarely are those strings/offsets reused I'm streaming through customer/provider data files, so caching wouldn't do much good other than waste space and the time to maintain them. If I knew that string indexing was O(something non constant), I'd have retooled my algorithms to take that into consider, but that would be a lot of code I'd need to touch. -tkc -- https://mail.python.org/mailman/listinfo/python-list