Thank you for opening this conversation, these functions have stung me in the past, and I would be so happy to see them gone :)
Personally, I would very much like to go with Plan A. - XML parsers that often deal with non-UTF-8 character encodings frequently use these functions. However, any parser worth their salt is better off using mbstring or iconv because of the lack of Windows-1252 support that is assumed elsewhere for ISO-8859. If we have a `utf8_encode` that supports Windows-1252 as often expected, I think plan B would be the more smoother upgrade. - On Packagist top 1000 downloads, stripe-php, phpcpd, pdepend, carbon, monolog, php-cs-fixer, htmlpurifier, and aws-php-sdk use `utf8_encode`. Some of these libraries depend on `ext-mbstring` or Symfony mbstring polyfill, so we are left with even fewer libraries that cannot assume `iconv()` or `mb_convert_encoding` availability. On Sun, Mar 21, 2021 at 7:48 PM Rowan Tommins <rowan.coll...@gmail.com> wrote: > > Hi all, > > The functions utf8_encode and utf8_decode are historical oddities, which > almost certainly would not be accepted if proposed today: > > * Their names do not describe their functionality, which is to convert > to/from one specific single-byte encoding. This leads to a common > confusion that they can be used to "fix" UTF-8 encoding problems, which > they generally make worse. > * That single-byte encoding is ISO 8859-1, not its common cousins > Windows-1252 or ISO 88159-15. This means, for instance, that they do not > handle the Euro sign: utf8_decode('€') returns '?' (i.e. unmappable) > not "\x80" (Windows-1252) or "\xA4" (8859-15) > > On the other hand, they are commonly used, both correctly and > incorrectly, so removing them is not easy. > > A previous proposal to remove them [1] resulted in Andrea making two > significant improvements: moving them from ext/xml to ext/standard [2] > and rewriting the documentation to explain them properly [3]. My genuine > thanks for that. > > However, it hasn't stopped people misunderstanding them, and quite > reasonably: you shouldn't need to look up every function you use in the > manual, to make sure it actually does what its name suggests. > > > I can see three ways forward: > > A) Raise a deprecation notice in 8.1, and remove in 9.0. Do not provide > a specific replacement, but recommend people look at iconv() or > mb_convert_encoding(). There is precedent for this, such as > convert_cyr_string(), but it may frustrate those who are using the > functions correctly. > > B) Introduce new names, such as utf8_to_iso_8859_1 and > iso_8859_1_to_utf8; immediately make those the primary names in the > manual, with utf8_encode / utf8_decode as aliases. Raise deprecation > notices for the old names, either immediately or in some future release. > This gives a smoother upgrade path, but commits us to having these > functions as outliers in our standard library. > > C) Leave them alone forever. Treat it as the user's fault if they mess > things up by misunderstanding them. > > > I am happy to put together an RFC for either A or B, if it has a chance > of reaching consensus. I would really like to avoid option C. > > > [1] https://externals.io/message/95166 > [2] https://github.com/php/php-src/pull/2160 > [3] > https://github.com/php/doc-en/commit/838941f6cce51f3beda16012eb497b26295a8238 > > Regards, > > -- > Rowan Tommins > [IMSoP] > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: https://www.php.net/unsub.php > -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: https://www.php.net/unsub.php