On Mon, Mar 2, 2020 at 8:10 PM Andrea Faulds <a...@ajf.me> wrote: > Hi, > > Philipp Tanlak wrote: > > I like to elaborate on Nikitas response: I don't think a mb_str_contains > is > > necessary, because the proposed function does not behave differently, if > > the input strings are multibyte strings. > > This is not true for all character encodings. For UTF-8 it is correct, > but consider for example the Japanese encoding Shift_JIS, where the > second byte of a multi-byte character can be a valid first byte of a > single-byte character. str_contains() would have incorrect behaviour for > this case. >
That's of course true, but I consider it ultimately unimportant. Accepting non UTF-8 encodings for anything other than mb_convert_encoding() is just another failure of the mbstring API. The mb_strpos() function literally works by converting the given string to UTF-8 and then calling the normal strpos() on it, after sprinkling in some nice O(n) offset to byte offset (and reverse) conversions. You are generally much better off canonicalizing everything to UTF-8 for internal processing and using normal str* functions. But well, that's a different discussion... Regards, Nikita