ID: 25707 Updated by: [EMAIL PROTECTED] Reported By: Bjorn dot Victor at it dot uu dot se -Status: Assigned +Status: Closed Bug Type: Strings related Operating System: Solaris 8 PHP Version: 4.3.3 Assigned To: moriyoshi New Comment:
This bug has been fixed in CVS. In case this was a PHP problem, snapshots of the sources are packaged every three hours; this change will be in the next snapshot. You can grab the snapshot at http://snaps.php.net/. In case this was a documentation problem, the fix will show up soon at http://www.php.net/manual/. In case this was a PHP.net website problem, the change will show up on the PHP.net site and on the mirror sites in short time. Thank you for the report, and for helping us make PHP better. The fix will be in 4.3.4-rc2. Previous Comments: ------------------------------------------------------------------------ [2003-10-01 17:31:22] [EMAIL PROTECTED] html_entity_decode(htmlentities("<")) returns "<", but IMHO it should return the original "<". The unhtmlentities() function given on http://www.php.net/html_entity_decode works like it should (in my eyes). ------------------------------------------------------------------------ [2003-10-01 03:31:42] Bjorn dot Victor at it dot uu dot se Sorry, this is not an RTFM error, and has nothing to do with the optional parameters of the function. I have changed the summary to refer to "lt", to avoid confusion with ENT_QUOTES etc - believe me, I tried this before looking at the source and figuring out what the error really was. The current code works like this: iterate over the 6 "basic_entities", replace the entity with its character in the string. "&" is the first item in basic_entities, which is good when you're doing htmlentities (the reverse operation). Given a string "&lt;", it will first become "<", and then (because "<" is handled after "&"), "<". Consider doing "&" last, e.g. by traversing basic_entities backwards: "&lt;" becomes "<", which is the expected. ------------------------------------------------------------------------ [2003-09-30 15:00:59] [EMAIL PROTECTED] RTFM: http://www.php.net/html_entity_decode (the 2nd optional parameter..) ------------------------------------------------------------------------ [2003-09-30 14:52:20] Bjorn dot Victor at it dot uu dot se Description: ------------ Symptom: html_entity_decode("&quot;") returns '"', while the expected value would be """. Corresponding (wrong) behaviour for & followed by "lt;", "gt;" etc. Another example is html_entity_decode(htmlentities("<")) which returns "<" rather than "<" as expected. As a result, html_entity_decode can not be used as the inverse of htmlentities. Diagnosis: The function (php_unescape_html_entities in ext/standard/html.c) replaces each entity in basic_entities with its corresponding character, but starts by replacing "&" with "&", the resulting string being """, which is then replaced by '"'. Solution: php_unescape_html_entities in ext/standard/html.c traverses the basic_entities from the wrong end; it must replace "&" *last*, not *first*. Reproduce code: --------------- print html_entity_decode("&quot;&lt;&gt;"); Expected result: ---------------- "<> Actual result: -------------- "<> ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=25707&edit=1