ID:               25707
 Updated by:       [EMAIL PROTECTED]
 Reported By:      Bjorn dot Victor at it dot uu dot se
-Status:           Assigned
+Status:           Closed
 Bug Type:         Strings related
 Operating System: Solaris 8
 PHP Version:      4.3.3
 Assigned To:      moriyoshi
 New Comment:

This bug has been fixed in CVS.

In case this was a PHP problem, snapshots of the sources are packaged
every three hours; this change will be in the next snapshot. You can
grab the snapshot at http://snaps.php.net/.
 
In case this was a documentation problem, the fix will show up soon at
http://www.php.net/manual/.

In case this was a PHP.net website problem, the change will show
up on the PHP.net site and on the mirror sites in short time.
 
Thank you for the report, and for helping us make PHP better.

The fix will be in 4.3.4-rc2.



Previous Comments:
------------------------------------------------------------------------

[2003-10-01 17:31:22] [EMAIL PROTECTED]

html_entity_decode(htmlentities("&lt;")) returns "<", but IMHO it
should return the original "&lt;". 

The unhtmlentities() function given on
http://www.php.net/html_entity_decode works like it should (in my
eyes).

------------------------------------------------------------------------

[2003-10-01 03:31:42] Bjorn dot Victor at it dot uu dot se

Sorry, this is not an RTFM error, and has nothing to do with the
optional parameters of the function. I have changed the summary to
refer to "lt", to avoid confusion with ENT_QUOTES etc - believe me, I
tried this before looking at the source and figuring out what the error
really was.

The current code works like this: iterate over the 6 "basic_entities",
replace the entity with its character in the string.  "&amp;" is the
first item in basic_entities, which is good when you're doing
htmlentities (the reverse operation).

Given a string "&amp;lt;", it will first become "&lt;", and then
(because "&lt;" is handled after "&amp;"), "<".

Consider doing "&amp;" last, e.g. by traversing basic_entities
backwards: 
"&amp;lt;" becomes "&lt;", which is the expected.

------------------------------------------------------------------------

[2003-09-30 15:00:59] [EMAIL PROTECTED]

RTFM: http://www.php.net/html_entity_decode
(the 2nd optional parameter..)


------------------------------------------------------------------------

[2003-09-30 14:52:20] Bjorn dot Victor at it dot uu dot se

Description:
------------
Symptom:
html_entity_decode("&amp;quot;") returns '"', while the expected value
would be "&quot;".  Corresponding (wrong) behaviour for &amp; followed
by "lt;", "gt;" etc.

Another example is html_entity_decode(htmlentities("&lt;")) which
returns "<" rather than "&lt;" as expected.

As a result, html_entity_decode can not be used as the inverse of
htmlentities.

Diagnosis:
The function (php_unescape_html_entities in ext/standard/html.c)
replaces each entity in basic_entities with its corresponding
character, but starts by replacing "&amp;" with "&", the resulting
string being "&quot;", which is then replaced by '"'.

Solution:
php_unescape_html_entities in ext/standard/html.c traverses the
basic_entities from the wrong end; it must replace "&amp;" *last*, not
*first*.

Reproduce code:
---------------
print html_entity_decode("&amp;quot;&amp;lt;&amp;gt;");

Expected result:
----------------
&quot;&lt;&gt;

Actual result:
--------------
"<>


------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=25707&edit=1

Reply via email to