Hi Yasuo,

Yasuo Ohgaki wrote:
utf8_decode() and utf8_encode() are not needed and causing problems
than solving.

https://wiki.php.net/rfc/remove_utf_8_decode_encode

Proposal
   - Document deprecation them now
   - Remove them from 7.2

I think only few users are using and they shouldn't have problem using
mbstring/iconv/intl functions.

Any comments?

I don't agree with this. utf8_decode() and _encode() are functions which you probably ought not to use in modern code, and the names are maybe unhelpful (decode to what? encode from what?). But the job they do is sometimes needed (if you're dealing with this specific legacy encoding), and I believe they work correctly. Plus, a lot of existing code uses them. This seems like a needless deprecation for this reason.

I would propose something else: remove them from the XML extension, and move them somewhere more fitting, like ext/intl, ext/mbstring or maybe ext/standard. These are generic functions which work on any text, not just XML, and their inclusion is mutually superfluous with respect to XML: if you're decoding XML, you don't necessarily need to convert text to/from UTF-8, and if you're converting text to/from UTF-8, you don't necessarily need to deal with XML. Plus, given the names alone, you'd have no idea they're part of the XML extension.

Also, to avoid confusion, maybe they could be renamed to iso88591_to_utf8() and utf8_to_iso88591(), with the old names kept as aliases. I got this idea from this comment: http://php.net/manual/en/function.utf8-encode.php#104906

Another thing to consider is that the manual perhaps ought to warn the user that ISO-8859-1 is not Windows-1252. A lot of text on the Internet marked as the former is actually the latter (thanks to the widespread use of Windows), and browsers assume this. Windows-1252 contains some extra printable characters where ISO-8859-1 has control characters, such as the Euro sign, curly quotes, the trademark sign, and some extra lengths of dash. So, interpreting Windows-1252 text as ISO-8859-1 will garble such characters.

Thanks.

--
Andrea Faulds
https://ajf.me/

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to