Edit report at http://bugs.php.net/bug.php?id=53021&edit=1
ID: 53021 Updated by: cataphr...@php.net Reported by: thyamat at msn dot com Summary: html_entity_decode not working with CP-1251 (5.2 only) and ISO-8859-1 -Status: Verified +Status: Closed Type: Bug Package: Strings related Operating System: CentOS 5.5 PHP Version: 5.2.14 Assigned To: cataphract Block user comment: N New Comment: Fixed in trunk and PHP 5.3. Unfortunately, the current policy is to only apply security fixes to PHP 5.2, so it won't be fixed there. The &é is a separate issue. It would certainly be preferable to decode that (and indeed we accept e.g. &<), but it's not a bug as it's invalid anyway. I'll do a little refactoring in php_unescape_html_entities and improve that as well in the process, but I'll apply it only to trunk. Previous Comments: ------------------------------------------------------------------------ [2010-10-08 18:20:00] cataphr...@php.net Automatic comment from SVN on behalf of cataphract Revision: http://svn.php.net/viewvc/?view=revision&revision=304208 Log: - Fixed bug #53021 (In html_entity_decode, failure to convert numeric entities with ENT_NOQUOTES and ISO-8859-1). ------------------------------------------------------------------------ [2010-10-08 11:27:23] thyamat at msn dot com Actually, the first bug is not only about é but all entities from € to ÿ (except entities shared by both encodings) About &é not being decoded, you don't see it as a bug, right ? ------------------------------------------------------------------------ [2010-10-08 11:01:52] cataphr...@php.net There are two bugs here: * There's a bug in PHP 5.2.14 in that it shouldn't decode é when the encoding is Windows-1251, as the character is not representable in that encoding. * There's a bug in both PHP 5.2.14 and PHP 5.3.3 in that é is not decoded when the encoding is ISO-8859-1. Windows-1252 works fine in both versions. ------------------------------------------------------------------------ [2010-10-08 09:45:21] thyamat at msn dot com Description: ------------ Hi, There seems to be many bugs with html_entity_decode. Using cp1252 encoding, it decodes HTML numeric entities as if encoding was cp1251 (please note that it works correctly on 5.3.3). Using iso-8859-1 encoding does not seem to decode any numeric entity at all (same situation in 5.3.3). Please also note that &é is never decoded neither on 5.2.14 nor on 5.3.3. Test script: --------------- html_entity_decode('é&é é é&é é& &é', ENT_NOQUOTES, 'cp1252'); html_entity_decode('é&é é é&é é& &é', ENT_NOQUOTES, 'cp1251'); html_entity_decode('é&é é é&é é& &é', ENT_NOQUOTES, 'iso-8859-1'); Expected result: ---------------- expected results : é&é é é&é é& &é é&é é é&é é& &é é&é é é&é é& &é Actual result: -------------- results in 5.2.14 : й&é й й&й й& &é é&é é é&é é& &é é&é é é&é é& &é results in 5.3.3 : é&é é é&é é& &é é&é é é&é é& &é é&é é é&é é& &é ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/bug.php?id=53021&edit=1