On 20/02/2022 21:24, Craig Francis wrote:

Only query I have is about the availability of different functions... not sure why, but the documentation says these are provided by the "xml" extension, even though it looks like they are in `./standard/string.c` (your pull request seems to correct this)... so I assume projects have used these functions on the basis that they are always available... I suppose you could argue that "iconv" is enabled by default, so that's hopefully reliable (even though it can be disabled with `--without-iconv`)... whereas "mbstring" and "intl" are non-default extensions.


Yes, since 7.2, utf8_encode and utf8_decode have been always available; before that, they were in ext/xml (which in practice meant *nearly* always available). The fact that none of the alternatives are guaranteed to be available is unfortunate, but by their nature they are large amounts of code, so moving or replicating them in core is not really an option.

I don't have hard facts to back it up, but my impression is that ext/mbstring is quite commonly installed, and required by apps and libraries. Unlike the other two, it has no system dependencies, because the implementation is entirely in PHP's source tree.

I'm not sure how often iconv is enabled (default according to php.net doesn't necessarily mean default according to Ubuntu / Centos / cheap shared hosting), but its functionality isn't very portable between systems - for instance, 3v4l.org rejects 'ISO-8859-1' as an encoding [https://3v4l.org/biGa8], but my local system accepts it, although both report ICONV_IMPL as "glibc".

ext/intl is by far the most powerful of the three extensions, albeit extremely poorly documented; but it may not be installed as often, because that power comes from a large external library (ICU).

The bright side is that if you really do only need one encoding pair, implementing in pure PHP is pretty trivial, and there are multiple polyfills already out there. That leaves a minority of a minority of a minority, who a) actually need Latin1 <-> UTF-8, and no other encodings; b) can't rely on any of the three listed extensions; AND c) care enough about performance that a pure PHP implementation is problematic.

Regards,

--
Rowan Tommins
[IMSoP]

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Reply via email to