On Sun, Jun 23, 2019 at 5:30 PM Ben Ramsey <b...@benramsey.com> wrote:

> > On Jun 23, 2019, at 05:35, Rowan Collins <rowan.coll...@gmail.com>
> wrote:
> >
> > On 22 June 2019 20:56:24 BST, Ben Ramsey <b...@benramsey.com> wrote:
> >> Perhaps it would only be an issue with the case-insensitive versions,
> >> as Nikita points out? If so, can someone provide some example strings
> >> where an mb_starts_with_ci() would return true, while
> >> str_starts_with_ci() would return false?
> >
> >
> > That's easy: any character that has a lower- and uppercase form, and is
> not represented as one byte in the target encoding. For that matter, any
> such character in the non-ASCII section of a single-byte encoding, since a
> non-mbstring case insensitive flag would presumably leave everything other
> than ASCII letters untouched.
> >
> > So, any non-Latin script, like Greek or Cyrillic; any accented
> characters, unless you're lucky and they're represented by ASCII-letter
> plus combining modifier; the Turkish "i", which if I remember rightly has
> three forms not two; and so on.
>
>
> According to Google, "İyi akşamlar” is the Turkish phrase for “Good
> evening” (Turkish speakers, please correct me, if this wrong). However,
> using the existing mb_* functions, I can’t get mb_stripos() to return 0
> when trying to see if the string “İYI AKŞAMLAR” begins with “i̇yi.”
>
> I’m just using UTF-8, so maybe there’s an encoding issue here?
>
> $string = 'İyi akşamlar';
> $upper = mb_strtoupper($string);
> $lowerChars = mb_strtolower(mb_substr($string, 0, 3));
>
> var_dump($string, $upper, $lowerChars);
> var_dump(mb_stripos($upper, $lowerChars));
>

The reason why this doesn't work is that mb_stripos internally performs a
simple case fold, while a full case fold would be needed in this case
(Turkish i is hard). It's a bit tricky due to the need to remap character
offsets.

Nikita

Reply via email to