On 22/03/2021 01:15, Sara Golemon wrote:
My preference is for a deprecation notice (but not necessarily removal ever -- We can argue that part a little).
I'm strongly against any concept of "indefinite deprecation". I consider any deprecation notice a commitment to remove the feature in the future, even if a specific timeline for that removal is not given.
If we want to have a separate status of "will be kept indefinitely, but you shouldn't use it", then we need a separate E_DISCOURAGED, or some boilerplate in the manual which doesn't use the word "deprecated".
As for details, I don't love iso_8859_1_to_utf8(), but we can use the common alias for iso-8859-1 known as latin1 and call the new functions: utf8_from_latin1() and utf8_to_latin1() with the caveat that the later will throw a ValueError for codepoints which are out of range (one of the more problematic issues with utf8_decode()). That makes this not just a simple rename for clarity, but what I'd consider a bug-fix for an unfortunately unfixable function.
While I can see the temptation here, I'm not sure who the target audience for the new function would be:
* People who just want to replace calls to utf8_decode won't want to go through every call and make it exception safe. * People who want to write a polyfill couldn't use it, because they wouldn't be able to recover the remainder of the string after an error is thrown. * People who want transcoding without any optional extensions will be disappointed to find only this one encoding supported.
You'd effectively be adding a completely new core function just for those people who work with Latin1 text, and are confident that it's not Windows-1252 in disguise.
It's tempting to make any C1 control characters an error as well - although technically valid in Latin1, these are very rarely used, and it's much more likely that any bytes in that range are intended as characters in Windows-1252. But that would feel very odd without having a corresponding utf8_from_windows1252 function to use instead, at which point we're into designing a whole new conversion library. And of course, once you've got that UTF-8 string, you can't do much with it, because PHP's native string functions are all byte-based, so you've basically got to re-invent large chunks of ext/mbstring...
Regards, -- Rowan Tommins [IMSoP] -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: https://www.php.net/unsub.php