A thought I had: if variable-width encodings are so difficult because it's hard to index into them by character, why don't we break them up ourselves? +PV-------+ +strchunk---------------------+-+ +strchunk---------------------+-+ |string |-->|the quick brown fox jumped ov|>+-->|er the lazy dog |/| |... | +-----------------------------+-+ +-----------------------------+-+ Now, if we want to substr($str, 40, 1), we can skip the first chunk. (32 was a number I picked out of the air; other numbers may be better.) This avoids the possible huge overheads of other linked-list approaches while also avoiding some of the linear scanning that would otherwise be required to index into the string. As far as things with lvalue substr()...we could fudge that number a bit and allow strchunks to be a little more or less than 32, as long as they know their size. Then, whey you scan, you just add up the number of characters in each chunk until you overshoot. That makes scanning a bit slower, but not much. (We'd probably also want the string to rebalance itself periodically, but that's a different story.) An alternate approach would be to remember how far into the string you have to index to get to certain points in the string. (For the purpose of this part of the document, a 'byte' is a codepoint and a 'character' is an abstract character.) For example: +PV-------+ |string |-->"the quick brown fox jumped over the lazy dog" |length 44| |bytes 44| |half 22| |quar 11| |threeq 33| |... | Although in this example the string is normal ASCII, consider what we would have if we replaced the 'o' in 'brown' and the 'a' in 'lazy' with two-byte characters (represented by a doubled letter): +PV-------+ |string |-->"the quick broown fox jumped over the laazy dog" |length 44| |bytes 46| |half 23| |quar 11| |threeq 34| |... | Now, on a call like substr($str, 36, 1) we can skip all the way to byte 34--which we know represents character number 33--and count from there. --Brent Dax [EMAIL PROTECTED] "...and if the answers are inadequate, the pumpqueen will be overthrown in a bloody coup by programmers flinging dead Java programs over the walls with a trebuchet."