On 09/02/2012 12:58 PM, wxjmfa...@gmail.com wrote: > My rationale: very simple. > > 1) I never heard about something better than sticking with one > of the Unicode coding scheme. (genreral theory) > 2) I am not at all convinced by the "new" Py 3.3 algorithm. I'm not the > only one guy, who noticed problems. Arguing, "it is fast enough", is not > a correct answer.
If this is true, why were you holding ho Google Go as an example of doing it right? Certainly Google Go doesn't line up with your rational. Go has both Strings and Runes. But strings are UTF-8-encoded bytes strings and Runes are 32-bit integers. They are not interchangeable without a costly encoding and decoding process. Even worse, indexing a Go string to get a "Rune" involves some very costly decoding that has to be done starting at the beginning of the string each time. In the worst case, Python's strings are as slow as Go because Python does the exact same thing as Go, but chooses between three encodings instead of just one. Best case scenario, Python's strings could be much faster than Go's because indexing through 2 of the 3 encodings is O(1) because they are constant-width encodings. If as you say, the latin-1 subset of UTF-8 is used, then UTF-8 indexing is O(1) too, otherwise it's probably O(n). -- http://mail.python.org/mailman/listinfo/python-list