Re: [PHP-DEV] Multibyte strings

Sara Golemon Fri, 11 Feb 2022 09:51:49 -0800

On Fri, Feb 11, 2022 at 12:26 AM Michał <aaat...@o2.pl> wrote:

> It's a known fact that nowadays most websites use at least UTF-8
> encoding. Unfortunately PHP itself has stopped a bit in the previous
> century. Is there any reason why the mbstring extension cannot be
> introduced to core in the next major version (maybe preceded with a
> deprecation message like it was with the mysql extension in v5)? All
> functions from the standard library would become aliases for multibyte
> equivalents.
>
>
Only that it would break a great number of assumptions if strlen("é") after
decades of returning 2 suddenly returned 1.  That's a trite example, but
it's the sort of deep rabbit hole that emerges when you start to really
examine the problem in depth.


Perhaps you're unfamiliar with the work that went into PHP 6. It turns out
that building unicode into the heart of PHP isn't a new idea that you've
just had, it's something which we invested a great deal of effort into and
the discovery we made along the way is.... it's a great deal of
complication and computational overhead for dubious benefit.  Turns out
that yes, developers do use UTF-8 almost exclusively and they know exactly
when to use multi-byte aware functions and when octet focused functions
make more sense.  The landscape is covered in abstractions to make this
simple and automatic, and suddenly changing the foundation would do more
harm than good both in terms of developer productivity and performance.

-Sara

Re: [PHP-DEV] Multibyte strings

Reply via email to