Hi,

this is a follow-up of a bug I opened, and cmb suggested to continue
here: https://bugs.php.net/bug.php?id=80166

Advantages:

1: Easier string manipulation:
If somebody does (as in my case) `preg_match_all()` with
PREG_OFFSET_CAPTURE, what will they probably use those returned
numbers/offsets for?
My answer: For *splitting the string* - in some way or the other. Now,
with byte offsets, I can't do such basic things as just `+1` to get to
the next character. Or extract exactly 3 characters.

2: Better performance:
This may sound odd, since cmb said the exact opposite ;-) (sequential
access vs. random access). However, if I need character offsets (see 1),
what can I do? I'm forced to use some workaround on top - as e.g.
https://www.php.net/manual/en/function.preg-match-all.php#71572 - which
is certainly way slower than any native implementation.

3: Consistency with users' expectations:
The current behavior is causing confusion and is perceived as
counter-intuitive, see
https://www.php.net/manual/en/function.preg-match-all.php#61426 and
https://stackoverflow.com/questions/1725227/preg-match-and-utf-8-in-php

So I'm suggesting:

* Either do the BC break, and just return byte offsets if the modifier
`u` is given.
* Or create *new* functions for it: `mb_preg_match_all()` etc.

--

Cheers,
Thomas

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Reply via email to