Re: [PHP-DEV] Proposal for a new basic function: str_contains

Nikita Popov Mon, 02 Mar 2020 12:01:08 -0800

On Mon, Mar 2, 2020 at 8:10 PM Andrea Faulds <a...@ajf.me> wrote:

> Hi,
>
> Philipp Tanlak wrote:
> > I like to elaborate on Nikitas response: I don't think a mb_str_contains
> is
> > necessary, because the proposed function does not behave differently, if
> > the input strings are multibyte strings.
>
> This is not true for all character encodings. For UTF-8 it is correct,
> but consider for example the Japanese encoding Shift_JIS, where the
> second byte of a multi-byte character can be a valid first byte of a
> single-byte character. str_contains() would have incorrect behaviour for
> this case.
>


That's of course true, but I consider it ultimately unimportant. Accepting
non UTF-8 encodings for anything other than mb_convert_encoding() is just
another failure of the mbstring API. The mb_strpos() function literally
works by converting the given string to UTF-8 and then calling the normal
strpos() on it, after sprinkling in some nice O(n) offset to byte offset
(and reverse) conversions. You are generally much better off canonicalizing
everything to UTF-8 for internal processing and using normal str*
functions. But well, that's a different discussion...

Regards,
Nikita

Re: [PHP-DEV] Proposal for a new basic function: str_contains

Reply via email to