From: masakielastic at gmail dot com Operating system: Mac OSX PHP version: 5.5.0RC3 Package: mbstring related Bug Type: Bug Bug description:mb_convert_encoding breaks well-formed character
Description: ------------ When converting string from UTF-8 to UTF-8 by using mb_convert_encoding for replacing ill-formed byte sequence with the substitute character(U+FFFD), mb_convert_encoding replaces the character follwing ill-formed byte sequence with the substitute character. mb_convert_encoding also delete trailing ill-formed byte sequence and doesn't replace it with the substitute character. The comprehensive test case for 2-4 byte characters is here: https://gist.github.com/masakielastic/5793665 . Test script: --------------- // U+24B62: "\xF0\xA4\xAD\xA2" // ill-formed: "\xF0\xA4\xAD" // U+FFFD: "\xEF\xBF\xBD" $str = "\xF0\xA4\xAD". "\xF0\xA4\xAD\xA2"."\xF0\xA4\xAD\xA2"; $expected = "\xEF\xBF\xBD"."\xF0\xA4\xAD\xA2"."\xF0\xA4\xAD\xA2"; $str2 = "\xF0\xA4\xAD\xA2"."\xF0\xA4\xAD\xA2"."\xF0\xA4\xAD"; $expected2 = "\xF0\xA4\xAD\xA2"."\xF0\xA4\xAD\xA2"."\xEF\xBF\xBD"; mb_substitute_character(0xFFFD); var_dump( $expected === htmlspecialchars_decode(htmlspecialchars($str, ENT_SUBSTITUTE, 'UTF-8')), $expected2 === htmlspecialchars_decode(htmlspecialchars($str2, ENT_SUBSTITUTE, 'UTF-8')), $expected === mb_convert_encoding($str, 'UTF-8', 'UTF-8'), $expected2 === mb_convert_encoding($str2, 'UTF-8', 'UTF-8') ); Expected result: ---------------- bool(true) bool(true) bool(true) bool(true) Actual result: -------------- bool(true) bool(true) bool(false) bool(false) -- Edit bug report at https://bugs.php.net/bug.php?id=65045&edit=1 -- Try a snapshot (PHP 5.4): https://bugs.php.net/fix.php?id=65045&r=trysnapshot54 Try a snapshot (PHP 5.3): https://bugs.php.net/fix.php?id=65045&r=trysnapshot53 Try a snapshot (trunk): https://bugs.php.net/fix.php?id=65045&r=trysnapshottrunk Fixed in SVN: https://bugs.php.net/fix.php?id=65045&r=fixed Fixed in release: https://bugs.php.net/fix.php?id=65045&r=alreadyfixed Need backtrace: https://bugs.php.net/fix.php?id=65045&r=needtrace Need Reproduce Script: https://bugs.php.net/fix.php?id=65045&r=needscript Try newer version: https://bugs.php.net/fix.php?id=65045&r=oldversion Not developer issue: https://bugs.php.net/fix.php?id=65045&r=support Expected behavior: https://bugs.php.net/fix.php?id=65045&r=notwrong Not enough info: https://bugs.php.net/fix.php?id=65045&r=notenoughinfo Submitted twice: https://bugs.php.net/fix.php?id=65045&r=submittedtwice register_globals: https://bugs.php.net/fix.php?id=65045&r=globals PHP 4 support discontinued: https://bugs.php.net/fix.php?id=65045&r=php4 Daylight Savings: https://bugs.php.net/fix.php?id=65045&r=dst IIS Stability: https://bugs.php.net/fix.php?id=65045&r=isapi Install GNU Sed: https://bugs.php.net/fix.php?id=65045&r=gnused Floating point limitations: https://bugs.php.net/fix.php?id=65045&r=float No Zend Extensions: https://bugs.php.net/fix.php?id=65045&r=nozend MySQL Configuration Error: https://bugs.php.net/fix.php?id=65045&r=mysqlcfg