Re: [PHP-DEV] Suggestion: Make all PCRE functions return character offsets, rather than byte offsets if the modifier `u` (PCRE_UTF8) is given

Claude Pache Fri, 02 Oct 2020 13:12:49 -0700

Hi,

Working with UTF-8-encoded strings does not implies working with mb_string 
functions or with code-point counts. Personnally, I work with standard string 
functions, plus [Grapheme functions] 
(https://www.php.net/manual/en/ref.intl.grapheme.php 
<https://www.php.net/manual/en/ref.intl.grapheme.php>) when I need to split my 
string between “characters” (which means for me “grapheme clusters”, not “code 
points”, so that mb_string functions are useless for me). In particular, 
PREG_OFFSET_CAPTURE does always what I need, even when using the /u flag.


If this is a feature that you want to implement, I suggests adding a flag 
PREG_UTF8_CODEPOINT_OFFSET_CAPTURE.

—Claude

Re: [PHP-DEV] Suggestion: Make all PCRE functions return *character* offsets, rather than *byte* offsets if the modifier `u` (PCRE_UTF8) is given

Reply via email to

Re: [PHP-DEV] Suggestion: Make all PCRE functions return character offsets, rather than byte offsets if the modifier `u` (PCRE_UTF8) is given