Edit report at http://bugs.php.net/bug.php?id=53021&edit=1

 ID:                 53021
 Updated by:         cataphr...@php.net
 Reported by:        thyamat at msn dot com
 Summary:            html_entity_decode not working with CP-1251 (5.2
                     only) and ISO-8859-1
-Status:             Verified
+Status:             Closed
 Type:               Bug
 Package:            Strings related
 Operating System:   CentOS 5.5
 PHP Version:        5.2.14
 Assigned To:        cataphract
 Block user comment: N

 New Comment:

Fixed in trunk and PHP 5.3. Unfortunately, the current policy is to only
apply security fixes to PHP 5.2, so it won't be fixed there.



The &é is a separate issue. It would certainly be preferable to
decode that (and indeed we accept e.g. &<), but it's not a bug as
it's invalid anyway. I'll do a little refactoring in
php_unescape_html_entities and improve that as well in the process, but
I'll apply it only to trunk.


Previous Comments:
------------------------------------------------------------------------
[2010-10-08 18:20:00] cataphr...@php.net

Automatic comment from SVN on behalf of cataphract
Revision: http://svn.php.net/viewvc/?view=revision&revision=304208
Log: - Fixed bug #53021 (In html_entity_decode, failure to convert
numeric entities with ENT_NOQUOTES and ISO-8859-1).

------------------------------------------------------------------------
[2010-10-08 11:27:23] thyamat at msn dot com

Actually, the first bug is not only about é but all entities from
€ to 

ÿ (except entities shared by both encodings)



About &é not being decoded, you don't see it as a bug, right ?

------------------------------------------------------------------------
[2010-10-08 11:01:52] cataphr...@php.net

There are two bugs here:



* There's a bug in PHP 5.2.14 in that it shouldn't decode é when
the encoding is Windows-1251, as the character is not representable in
that encoding.

* There's a bug in both PHP 5.2.14 and PHP 5.3.3 in that é is not
decoded when the encoding is ISO-8859-1.



Windows-1252 works fine in both versions.

------------------------------------------------------------------------
[2010-10-08 09:45:21] thyamat at msn dot com

Description:
------------
Hi,



There seems to be many bugs with html_entity_decode.



Using cp1252 encoding, it decodes HTML numeric entities as if encoding
was cp1251 

(please note that it works correctly on 5.3.3).

Using iso-8859-1 encoding does not seem to decode any numeric entity at
all (same 

situation in 5.3.3).



Please also note that &é is never decoded neither on 5.2.14 nor on
5.3.3.

Test script:
---------------
html_entity_decode('é&é é é&é é&
&é', ENT_NOQUOTES, 'cp1252');

html_entity_decode('é&é é é&é é&
&é', ENT_NOQUOTES, 'cp1251');

html_entity_decode('é&é é é&é é&
&é', ENT_NOQUOTES, 'iso-8859-1');

Expected result:
----------------
expected results :

é&é é é&é é& &é

é&é é é&é é& &é

é&é é é&é é& &é

Actual result:
--------------
results in 5.2.14 :

й&é й й&й й& &é

é&é é é&é é& &é

é&é é é&é é& &é



results in 5.3.3 :

é&é é é&é é& &é

é&é é é&é é& &é

é&é é é&é é& &é


------------------------------------------------------------------------



-- 
Edit this bug report at http://bugs.php.net/bug.php?id=53021&edit=1

Reply via email to