On Thu, Mar 28, 2013 at 7:34 AM, jmfauth <wxjmfa...@gmail.com> wrote: > The flexible string representation takes the problem from the > other side, it attempts to work with the characters by using > their representations and it (can only) fails...
This is false. As I've pointed out to you before, the FSR does not divide characters up by representation. It divides them up by codepoint -- more specifically, by the *bit-width* of the codepoint. We call the internal format of the string "ASCII" or "Latin-1" or "UCS-2" for conciseness and a point of reference, but fundamentally all of the FSR formats are simply byte arrays of *codepoints* -- you know, those things you keep harping on. The major optimization performed by the FSR is to consistently truncate the leading zero bytes from each codepoint when it is possible to do so safely. But regardless of to what extent this truncation is applied, the string is *always* internally just an array of codepoints, and the same algorithms apply for all representations. -- http://mail.python.org/mailman/listinfo/python-list