Le mar. 3 mars 2020 à 11:04, Rowan Tommins <rowan.coll...@gmail.com> a
écrit :

> On Tue, 3 Mar 2020 at 08:46, Andreas Heigl <andr...@heigl.org> wrote:
>
> >
> > While it is mainly aimed at being a mere convenience-function that could
> > also be easily implemented in userland it misses one main thing IMO when
> > handling unicode-strings: Normalization.
> >
> >
>
> While I would love to see more functionality for handling Unicode which
> didn't treat it as just another character set, I don't think sprinkling it
> into the main string functions of the language would be the right approach.
> Even if we changed all the existing functions to be "Unicode-aware", as was
> planned for PHP 6, the resulting API would not handle all cases correctly.
>
> In this case, a Unicode-based string API ought to provide at least two
> variants of "contains", as options or separate functions:
>
> - a version which matches on code point, for answering queries like "does
> this string contain right-to-left override characters?"
> - at least one form of normalization, but probably several
>
> If there was serious work on a new string API in progress, a freeze on
> additions to the current API would make sense; but right now, the
> byte-based string API is what we have, and I think this function is a
> sensible addition to it.
>


FYI, I wrote a String handling lib, shipped as Symfony String:
- doc: https://symfony.com/doc/current/components/string.html
- src: https://github.com/symfony/string

TL;DR, it provides 3 classes of value objects, dealing with bytes, code
points and grapheme cluster (~= normalized unicode)

It makes no sense to have `str_contains()` or any global function able to
deal with Unicode normalization *unless* the PHP string values embed their
unit system (one of: bytes, codepoints or graphemes).

With this rationale, I agree with Rowan: PHP's native string functions deal
with bytes. So should str_contains(). Other unit systems can be implemented
in userland (until PHP implements something similar to Symfony String in
core - but that's another topic.)

Nicolas

Reply via email to