Hi,

Working with UTF-8-encoded strings does not implies working with mb_string 
functions or with code-point counts. Personnally, I work with standard string 
functions, plus [Grapheme functions] 
(https://www.php.net/manual/en/ref.intl.grapheme.php 
<https://www.php.net/manual/en/ref.intl.grapheme.php>) when I need to split my 
string between “characters” (which means for me “grapheme clusters”, not “code 
points”, so that mb_string functions are useless for me). In particular, 
PREG_OFFSET_CAPTURE does always what I need, even when using the /u flag.

If this is a feature that you want to implement, I suggests adding a flag 
PREG_UTF8_CODEPOINT_OFFSET_CAPTURE.

—Claude



Reply via email to