Hi, this is a follow-up of a bug I opened, and cmb suggested to continue here: https://bugs.php.net/bug.php?id=80166
Advantages: 1: Easier string manipulation: If somebody does (as in my case) `preg_match_all()` with PREG_OFFSET_CAPTURE, what will they probably use those returned numbers/offsets for? My answer: For *splitting the string* - in some way or the other. Now, with byte offsets, I can't do such basic things as just `+1` to get to the next character. Or extract exactly 3 characters. 2: Better performance: This may sound odd, since cmb said the exact opposite ;-) (sequential access vs. random access). However, if I need character offsets (see 1), what can I do? I'm forced to use some workaround on top - as e.g. https://www.php.net/manual/en/function.preg-match-all.php#71572 - which is certainly way slower than any native implementation. 3: Consistency with users' expectations: The current behavior is causing confusion and is perceived as counter-intuitive, see https://www.php.net/manual/en/function.preg-match-all.php#61426 and https://stackoverflow.com/questions/1725227/preg-match-and-utf-8-in-php So I'm suggesting: * Either do the BC break, and just return byte offsets if the modifier `u` is given. * Or create *new* functions for it: `mb_preg_match_all()` etc. -- Cheers, Thomas -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: https://www.php.net/unsub.php