Hi, Working with UTF-8-encoded strings does not implies working with mb_string functions or with code-point counts. Personnally, I work with standard string functions, plus [Grapheme functions] (https://www.php.net/manual/en/ref.intl.grapheme.php <https://www.php.net/manual/en/ref.intl.grapheme.php>) when I need to split my string between “characters” (which means for me “grapheme clusters”, not “code points”, so that mb_string functions are useless for me). In particular, PREG_OFFSET_CAPTURE does always what I need, even when using the /u flag.
If this is a feature that you want to implement, I suggests adding a flag PREG_UTF8_CODEPOINT_OFFSET_CAPTURE. —Claude