ID: 25707 Updated by: [EMAIL PROTECTED] Reported By: Bjorn dot Victor at it dot uu dot se -Status: Bogus +Status: Verified Bug Type: Feature/Change Request Operating System: Solaris 8 PHP Version: 4.3.3 New Comment:
html_entity_decode(htmlentities("<")) returns "<", but IMHO it should return the original "<". The unhtmlentities() function given on http://www.php.net/html_entity_decode works like it should (in my eyes). Previous Comments: ------------------------------------------------------------------------ [2003-10-01 03:31:42] Bjorn dot Victor at it dot uu dot se Sorry, this is not an RTFM error, and has nothing to do with the optional parameters of the function. I have changed the summary to refer to "lt", to avoid confusion with ENT_QUOTES etc - believe me, I tried this before looking at the source and figuring out what the error really was. The current code works like this: iterate over the 6 "basic_entities", replace the entity with its character in the string. "&" is the first item in basic_entities, which is good when you're doing htmlentities (the reverse operation). Given a string "&lt;", it will first become "<", and then (because "<" is handled after "&"), "<". Consider doing "&" last, e.g. by traversing basic_entities backwards: "&lt;" becomes "<", which is the expected. ------------------------------------------------------------------------ [2003-09-30 15:00:59] [EMAIL PROTECTED] RTFM: http://www.php.net/html_entity_decode (the 2nd optional parameter..) ------------------------------------------------------------------------ [2003-09-30 14:52:20] Bjorn dot Victor at it dot uu dot se Description: ------------ Symptom: html_entity_decode("&quot;") returns '"', while the expected value would be """. Corresponding (wrong) behaviour for & followed by "lt;", "gt;" etc. Another example is html_entity_decode(htmlentities("<")) which returns "<" rather than "<" as expected. As a result, html_entity_decode can not be used as the inverse of htmlentities. Diagnosis: The function (php_unescape_html_entities in ext/standard/html.c) replaces each entity in basic_entities with its corresponding character, but starts by replacing "&" with "&", the resulting string being """, which is then replaced by '"'. Solution: php_unescape_html_entities in ext/standard/html.c traverses the basic_entities from the wrong end; it must replace "&" *last*, not *first*. Reproduce code: --------------- print html_entity_decode("&quot;&lt;&gt;"); Expected result: ---------------- "<> Actual result: -------------- "<> ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=25707&edit=1