Re: [PHP-DEV] What should we do with utf8_encode and utf8_decode?

Björn Larsson Mon, 22 Mar 2021 06:52:16 -0700

Den 2021-03-22 kl. 14:10, skrev Sara Golemon:

On Mon, Mar 22, 2021 at 5:24 AM Rowan Tommins <rowan.coll...@gmail.com>
wrote:

I'm strongly against any concept of "indefinite deprecation". I consider
any deprecation notice a commitment to remove the feature in the future,
even if a specific timeline for that removal is not given.


I don't feel strongly about indefinite deprecation.  If you wanna nuke it
in 9.0, have fun.  I'm just saying I don't necessarily see the need to do
so.  The problem being addressed here is that *some* users of this function
are probably misusing it, so it's worth putting guiderails on.  I'm
hesitant to punish the ones who know exactly what they're doing as a result
of that well-meaning intention.

* People who just want to replace calls to utf8_decode won't want to go
through every call and make it exception safe.


Then they shouldn't use these replacements, it's not for them. It's for
people using iso-8859-1.

* People who want to write a polyfill couldn't use it, because they
wouldn't be able to recover the remainder of the string after an error
is thrown.


If you're writing a polyfill, then write a polyfill.   The polyfill for the
old functions is trivial, I could have written it a dozen times in the
course of writing this email reply.
So this replacement is also not for them.

* People who want transcoding without any optional extensions will be
disappointed to find only this one encoding supported.

This function isn't for them.It's for people using iso-8859-1.

There's a theme in here. :)

You'd effectively be adding a completely new core function just for
those people who work with Latin1 text, and are confident that it's not
Windows-1252 in disguise.


Yes.  I'm specifically addressing the people who have been using
utf8_en/decode() correctly all this time.  They shouldn't be punished for
the stupidity of others.

It's tempting to make any C1 control characters an error as well -
although technically valid in Latin1, these are very rarely used, and
it's much more likely that any bytes in that range are intended as
characters in Windows-1252. But that would feel very odd without having
a corresponding utf8_from_windows1252 function to use instead, at which
point we're into designing a whole new conversion library. And of
course, once you've got that UTF-8 string, you can't do much with it,
because PHP's native string functions are all byte-based, so you've
basically got to re-invent large chunks of ext/mbstring...


I disagree that you'd need to add utf8_from/to_windows1252 "for
completeness".  The goal isn't to provide all possible conversion
utilities.  The goal is only to not punish users by taking away a valid API
that they were using correctly (for those users who were using it
correctly).

-Sara


Think I'm one such user :-) So keeping them and improving a little would
be fine with me!

r//Björn L

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] What should we do with utf8_encode and utf8_decode?

Reply via email to