On 09/02/2012 03:45 PM, Michael Torrie wrote: > <jmfauth snipped>: > In the worst case, Python's strings are as slow as Go because Python > does the exact same thing as Go, but chooses between three encodings > instead of just one. Best case scenario, Python's strings could be > much faster than Go's because indexing through 2 of the 3 encodings is > O(1) because they are constant-width encodings. If as you say, the > latin-1 subset of UTF-8 is used, then UTF-8 indexing is O(1) too, > otherwise it's probably O(n).
I'm afraid you have it backwards. the Utf-8 version of the latin-1-compatible characters would be variable length. But my understanding of the pep is that the internal one-byte format is simply the lowest order byte of each code point, after assuring that all code points in the particular string are less than 256. That's going to coincidentally resemble latin-1's encoding, but since it's an internal form, the resemblance is irrelevant. Anyway, those one-byte values are going to be O(1), naturally. No encoding involved, and no searching nor expanding. -- DaveA -- http://mail.python.org/mailman/listinfo/python-list