On Apr 13, 1:32 am, Antoon Pardon <[EMAIL PROTECTED]> wrote: > Suppose someone writes a function that acts on a sequence. > The algorithm used depending on the following invariant. > > i = s.index(e) => s[i] = e > > Then this algorithm is no longer guaranteed to work with strings.
It never worked correctly on unicode strings anyway (which becomes the canonical string in python 3.0). The base unit exposed by the implementation is rarely what you want to operate upon. The terminology is pretty confusing, but let's see if I can lay out the relationships here: byte ≤ code unit ≤ code point ≤ scalar value ≤ grapheme cluster ~ character ≤ syllable ≤ word ≤ sentence ≤ paragraph "12" in "123" allows you to handle bytes through scalar values the same way, glossing over the implementation details (such as UTF-32 on linux and UTF-16 on windows). -- Adam Olsen, aka Rhamphoryncus -- http://mail.python.org/mailman/listinfo/python-list