Tom Christiansen <tchr...@perl.com> added the comment: Antoine Pitrou <rep...@bugs.python.org> wrote on Sat, 13 Aug 2011 21:09:52 -0000:
> And/or a lookup table giving the byte offset of, say, every 16th > character. It gives you a O(1) lookup with a relatively reasonable > constant cost (you have to scan for less than 16 characters after the > lookup). > On small strings (< 256 UTF-8 bytes) the space overhead for the lookup > table would be 1/16. It could also be constructed lazily whenever more > than 2 positions are cached. You really should talk to the Perl 6 people to see whether their current strategy for caching offset maps for grapheme positions might be of use to you. Larry explained it to me once but I no longer recall any details. I notice though that they don't seem to think it worth doing for UTF-8 or UTF-16, just for their synthetic "NFG" (Grapheme Normalization Form) strings, where it would be needed even if they used UTF-32 underneath. --tom ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue12729> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com