From:             masakielastic at gmail dot com
Operating system: Mac OSX
PHP version:      5.5.0RC3
Package:          mbstring related
Bug Type:         Bug
Bug description:mb_convert_encoding breaks well-formed character

Description:
------------
When converting string from UTF-8 to UTF-8 by using mb_convert_encoding for

replacing ill-formed byte sequence with the substitute character(U+FFFD), 
mb_convert_encoding replaces the character follwing ill-formed byte
sequence with 
the substitute character. mb_convert_encoding also delete trailing
ill-formed byte 
sequence and doesn't replace it with the substitute character.

The comprehensive test case for 2-4 byte 
characters is here: https://gist.github.com/masakielastic/5793665 .

Test script:
---------------
// U+24B62: "\xF0\xA4\xAD\xA2"
// ill-formed: "\xF0\xA4\xAD"
// U+FFFD: "\xEF\xBF\xBD"

$str = "\xF0\xA4\xAD".  "\xF0\xA4\xAD\xA2"."\xF0\xA4\xAD\xA2";
$expected = "\xEF\xBF\xBD"."\xF0\xA4\xAD\xA2"."\xF0\xA4\xAD\xA2";

$str2 = "\xF0\xA4\xAD\xA2"."\xF0\xA4\xAD\xA2"."\xF0\xA4\xAD";
$expected2 = "\xF0\xA4\xAD\xA2"."\xF0\xA4\xAD\xA2"."\xEF\xBF\xBD";

mb_substitute_character(0xFFFD);
var_dump(
    $expected === htmlspecialchars_decode(htmlspecialchars($str,
ENT_SUBSTITUTE, 'UTF-8')),
    $expected2 === htmlspecialchars_decode(htmlspecialchars($str2,
ENT_SUBSTITUTE, 'UTF-8')), 
    $expected === mb_convert_encoding($str, 'UTF-8', 'UTF-8'),
    $expected2 === mb_convert_encoding($str2, 'UTF-8', 'UTF-8')
);

Expected result:
----------------
bool(true)
bool(true)
bool(true)
bool(true)

Actual result:
--------------
bool(true)
bool(true)
bool(false)
bool(false)

-- 
Edit bug report at https://bugs.php.net/bug.php?id=65045&edit=1
-- 
Try a snapshot (PHP 5.4):   
https://bugs.php.net/fix.php?id=65045&r=trysnapshot54
Try a snapshot (PHP 5.3):   
https://bugs.php.net/fix.php?id=65045&r=trysnapshot53
Try a snapshot (trunk):     
https://bugs.php.net/fix.php?id=65045&r=trysnapshottrunk
Fixed in SVN:               https://bugs.php.net/fix.php?id=65045&r=fixed
Fixed in release:           https://bugs.php.net/fix.php?id=65045&r=alreadyfixed
Need backtrace:             https://bugs.php.net/fix.php?id=65045&r=needtrace
Need Reproduce Script:      https://bugs.php.net/fix.php?id=65045&r=needscript
Try newer version:          https://bugs.php.net/fix.php?id=65045&r=oldversion
Not developer issue:        https://bugs.php.net/fix.php?id=65045&r=support
Expected behavior:          https://bugs.php.net/fix.php?id=65045&r=notwrong
Not enough info:            
https://bugs.php.net/fix.php?id=65045&r=notenoughinfo
Submitted twice:            
https://bugs.php.net/fix.php?id=65045&r=submittedtwice
register_globals:           https://bugs.php.net/fix.php?id=65045&r=globals
PHP 4 support discontinued: https://bugs.php.net/fix.php?id=65045&r=php4
Daylight Savings:           https://bugs.php.net/fix.php?id=65045&r=dst
IIS Stability:              https://bugs.php.net/fix.php?id=65045&r=isapi
Install GNU Sed:            https://bugs.php.net/fix.php?id=65045&r=gnused
Floating point limitations: https://bugs.php.net/fix.php?id=65045&r=float
No Zend Extensions:         https://bugs.php.net/fix.php?id=65045&r=nozend
MySQL Configuration Error:  https://bugs.php.net/fix.php?id=65045&r=mysqlcfg

Reply via email to