On Tue, 3 Mar 2020 at 08:46, Andreas Heigl <andr...@heigl.org> wrote:

>
> While it is mainly aimed at being a mere convenience-function that could
> also be easily implemented in userland it misses one main thing IMO when
> handling unicode-strings: Normalization.
>
>

While I would love to see more functionality for handling Unicode which
didn't treat it as just another character set, I don't think sprinkling it
into the main string functions of the language would be the right approach.
Even if we changed all the existing functions to be "Unicode-aware", as was
planned for PHP 6, the resulting API would not handle all cases correctly.

In this case, a Unicode-based string API ought to provide at least two
variants of "contains", as options or separate functions:

- a version which matches on code point, for answering queries like "does
this string contain right-to-left override characters?"
- at least one form of normalization, but probably several

If there was serious work on a new string API in progress, a freeze on
additions to the current API would make sense; but right now, the
byte-based string API is what we have, and I think this function is a
sensible addition to it.

Regards,
-- 
Rowan Tommins
[IMSoP]

Reply via email to