On 21 October 2014 23:21:37 GMT+01:00, Andrea Faulds <a...@ajf.me> wrote:

>Make array-like indexing with [] be by
>code points as you may be able to do that in constant time

If the internal representation is UTF8, both code point and grapheme access 
require traversal unless you have some additional index structure. Both can be 
trivialised to byte access if you have detected and stored that the string is 
entirely ASCII, but otherwise you will nearly always have multiple widths 
within one string.

If the internal representation is UTF16, code point access can be accelerated 
for any string containing only BMP characters (no surrogate pairs). The Perl6 
concept of "NFG" attempts to extend that advantage to grapheme access, and to 
points outside the BMP.


-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to