Em 2013-06-28 4:10, Kris Craig escreveu:
On Thu, Jun 27, 2013 at 6:43 PM, Yasuo Ohgaki <yohg...@ohgaki.net>
wrote:
2013/6/27 Kris Craig <kris.cr...@gmail.com>
Yeah I tried html_entity_decode already, but it just returned NULL.
On
the same input string, htmlspecialchars_decode returned the input
string
but with *some* special characters decoded; 10 and 13 ("\r\n", I
think)
were left in their encoded state. I'm not sure why there wouldn't
be an
option to decode all html special characters.
You are missing the design purpose of htmlspecialchars_decode and
html_entity_decode. Thruth is, they are not useful as they might seem.
Their purpose is not to decode all the entities, like a browser would
do. We do not implement anything approaching the sort parsing a browser
would do; for instance, html 5 says you should accept certain entities
not terminated with ; and parse the stream in a certain way and we don't
do it at all. The purpose of those two functions is just to provide
something approaching an inverse function for htmlspecialchars() and
htmlentities(). html_entity_decode() has somewhat deviated from this
(for instance, it decodes all numeric entites), but I think this should
nevertheless be the proper way one should think about those two
functions.
Not only HTML entities, we really needs to add several
decoder/encoder to
core.
For instance, Javascript \uXXXX, HTML &#XX/&#XXXX, etc.
I hope someone is working on it :)
Would you be interested in co-authoring an RFC with me for this?
See http://php.net/manual/en/transliterator.transliterate.php For HTML
entities, out of the box, only a transliterator for numeric entities is
provided (hex-any/XML10), but you can easily build your ruleset for the
named entities. The performance will be below of that of a dedicated
algorithm, though. And it only supports UTF-8.
--
Gustavo Lopes
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php