On Mon, Mar 22, 2021 at 5:24 AM Rowan Tommins <rowan.coll...@gmail.com> wrote: > I'm strongly against any concept of "indefinite deprecation". I consider > any deprecation notice a commitment to remove the feature in the future, > even if a specific timeline for that removal is not given. >
I don't feel strongly about indefinite deprecation. If you wanna nuke it in 9.0, have fun. I'm just saying I don't necessarily see the need to do so. The problem being addressed here is that *some* users of this function are probably misusing it, so it's worth putting guiderails on. I'm hesitant to punish the ones who know exactly what they're doing as a result of that well-meaning intention. > * People who just want to replace calls to utf8_decode won't want to go > through every call and make it exception safe. > Then they shouldn't use these replacements, it's not for them. It's for people using iso-8859-1. > * People who want to write a polyfill couldn't use it, because they > wouldn't be able to recover the remainder of the string after an error > is thrown. > If you're writing a polyfill, then write a polyfill. The polyfill for the old functions is trivial, I could have written it a dozen times in the course of writing this email reply. So this replacement is also not for them. > * People who want transcoding without any optional extensions will be > disappointed to find only this one encoding supported. > This function isn't for them.It's for people using iso-8859-1. There's a theme in here. :) > You'd effectively be adding a completely new core function just for > those people who work with Latin1 text, and are confident that it's not > Windows-1252 in disguise. > Yes. I'm specifically addressing the people who have been using utf8_en/decode() correctly all this time. They shouldn't be punished for the stupidity of others. > It's tempting to make any C1 control characters an error as well - > although technically valid in Latin1, these are very rarely used, and > it's much more likely that any bytes in that range are intended as > characters in Windows-1252. But that would feel very odd without having > a corresponding utf8_from_windows1252 function to use instead, at which > point we're into designing a whole new conversion library. And of > course, once you've got that UTF-8 string, you can't do much with it, > because PHP's native string functions are all byte-based, so you've > basically got to re-invent large chunks of ext/mbstring... > I disagree that you'd need to add utf8_from/to_windows1252 "for completeness". The goal isn't to provide all possible conversion utilities. The goal is only to not punish users by taking away a valid API that they were using correctly (for those users who were using it correctly). > -Sara