Chris Angelico <ros...@gmail.com> writes: > So, I don't actually have any stats for you, because it's really easy > to just not index strings at all.
Right, that's why I think the O(n) indexing issue of UTF-8 may be overblown. Haskell 98 was mentioned earlier as a language that did Unicode "correctly", but its strings are linked lists of code points. They are a performance pig to be sure but the O(n) indexing is usually not the bottleneck. These days there is a "Text" module that I think is basically UTF-16 arrays. I have been meaning to find out what happens with non-BMP characters. -- http://mail.python.org/mailman/listinfo/python-list