ID:               25707
 Updated by:       [EMAIL PROTECTED]
 Reported By:      Bjorn dot Victor at it dot uu dot se
-Status:           Bogus
+Status:           Verified
 Bug Type:         Feature/Change Request
 Operating System: Solaris 8
 PHP Version:      4.3.3
 New Comment:

html_entity_decode(htmlentities("&lt;")) returns "<", but IMHO it
should return the original "&lt;". 

The unhtmlentities() function given on
http://www.php.net/html_entity_decode works like it should (in my
eyes).


Previous Comments:
------------------------------------------------------------------------

[2003-10-01 03:31:42] Bjorn dot Victor at it dot uu dot se

Sorry, this is not an RTFM error, and has nothing to do with the
optional parameters of the function. I have changed the summary to
refer to "lt", to avoid confusion with ENT_QUOTES etc - believe me, I
tried this before looking at the source and figuring out what the error
really was.

The current code works like this: iterate over the 6 "basic_entities",
replace the entity with its character in the string.  "&amp;" is the
first item in basic_entities, which is good when you're doing
htmlentities (the reverse operation).

Given a string "&amp;lt;", it will first become "&lt;", and then
(because "&lt;" is handled after "&amp;"), "<".

Consider doing "&amp;" last, e.g. by traversing basic_entities
backwards: 
"&amp;lt;" becomes "&lt;", which is the expected.

------------------------------------------------------------------------

[2003-09-30 15:00:59] [EMAIL PROTECTED]

RTFM: http://www.php.net/html_entity_decode
(the 2nd optional parameter..)


------------------------------------------------------------------------

[2003-09-30 14:52:20] Bjorn dot Victor at it dot uu dot se

Description:
------------
Symptom:
html_entity_decode("&amp;quot;") returns '"', while the expected value
would be "&quot;".  Corresponding (wrong) behaviour for &amp; followed
by "lt;", "gt;" etc.

Another example is html_entity_decode(htmlentities("&lt;")) which
returns "<" rather than "&lt;" as expected.

As a result, html_entity_decode can not be used as the inverse of
htmlentities.

Diagnosis:
The function (php_unescape_html_entities in ext/standard/html.c)
replaces each entity in basic_entities with its corresponding
character, but starts by replacing "&amp;" with "&", the resulting
string being "&quot;", which is then replaced by '"'.

Solution:
php_unescape_html_entities in ext/standard/html.c traverses the
basic_entities from the wrong end; it must replace "&amp;" *last*, not
*first*.

Reproduce code:
---------------
print html_entity_decode("&amp;quot;&amp;lt;&amp;gt;");

Expected result:
----------------
&quot;&lt;&gt;

Actual result:
--------------
"<>


------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=25707&edit=1

Reply via email to