On Sun, Jun 23, 2019 at 5:30 PM Ben Ramsey <b...@benramsey.com> wrote:
> > On Jun 23, 2019, at 05:35, Rowan Collins <rowan.coll...@gmail.com> > wrote: > > > > On 22 June 2019 20:56:24 BST, Ben Ramsey <b...@benramsey.com> wrote: > >> Perhaps it would only be an issue with the case-insensitive versions, > >> as Nikita points out? If so, can someone provide some example strings > >> where an mb_starts_with_ci() would return true, while > >> str_starts_with_ci() would return false? > > > > > > That's easy: any character that has a lower- and uppercase form, and is > not represented as one byte in the target encoding. For that matter, any > such character in the non-ASCII section of a single-byte encoding, since a > non-mbstring case insensitive flag would presumably leave everything other > than ASCII letters untouched. > > > > So, any non-Latin script, like Greek or Cyrillic; any accented > characters, unless you're lucky and they're represented by ASCII-letter > plus combining modifier; the Turkish "i", which if I remember rightly has > three forms not two; and so on. > > > According to Google, "İyi akşamlar” is the Turkish phrase for “Good > evening” (Turkish speakers, please correct me, if this wrong). However, > using the existing mb_* functions, I can’t get mb_stripos() to return 0 > when trying to see if the string “İYI AKŞAMLAR” begins with “i̇yi.” > > I’m just using UTF-8, so maybe there’s an encoding issue here? > > $string = 'İyi akşamlar'; > $upper = mb_strtoupper($string); > $lowerChars = mb_strtolower(mb_substr($string, 0, 3)); > > var_dump($string, $upper, $lowerChars); > var_dump(mb_stripos($upper, $lowerChars)); > The reason why this doesn't work is that mb_stripos internally performs a simple case fold, while a full case fold would be needed in this case (Turkish i is hard). It's a bit tricky due to the need to remap character offsets. Nikita