ID: 31805 Updated by: [EMAIL PROTECTED] Reported By: gullevek at gullevek dot org -Status: Open +Status: Feedback Bug Type: mbstring related Operating System: gnu/linux PHP Version: 4.3.10 New Comment:
Please try using this CVS snapshot: http://snaps.php.net/php4-STABLE-latest.tar.gz For Windows: http://snaps.php.net/win32/php4-win32-STABLE-latest.zip Previous Comments: ------------------------------------------------------------------------ [2005-02-03 02:25:05] gullevek at gullevek dot org one more comment. the problem actually occoured, because mb_detect_enconding detects utf-8, even if the string is iso-2022-jp ------------------------------------------------------------------------ [2005-02-03 02:20:43] gullevek at gullevek dot org okay, it is not 100% a bug perhaps. problem is, if you have iso-2022-jp encoded data, and you don't have default set, php doesn't read it correctly (because iso-2022-jp is encoded very differently). see example below. enter two characters, one 1 bit (eg a) and one two bit (eg あ). then you will see, in the output with no iso set, the length is wrong. But I don't know why 4.3.10 behaves different to 4.3.9 ... <?php import_request_variables("p"); if ($send) { echo "S: $string<br>"; echo "D: ".mb_detect_encoding($string,"iso-2022-jp")."<br>"; echo strlen($string)." -- without iso: ".mb_strlen($string)." -- with iso".mb_strlen($string,"iso-2022-jp")."<br>"; } ?> <html><head> <meta http-equiv="Content-Type" content="text/html; charset=ISO-2022-JP"> </head> <body> <form method="post" name="foo" enctype="multipart/form-data"> <input type="text" name="string" size="50" value="<? echo $string; ?>"><br> <input type="submit" name="send" value="Send"> </form></body></html> ------------------------------------------------------------------------ [2005-02-02 08:41:18] [EMAIL PROTECTED] Please post the script somewhere online and provide a link (otherwise your Kanji might screw up in our form). ------------------------------------------------------------------------ [2005-02-02 08:40:52] [EMAIL PROTECTED] Thank you for this bug report. To properly diagnose the problem, we need a short but complete example script to be able to reproduce this bug ourselves. A proper reproducing script starts with <?php and ends with ?>, is max. 10-20 lines long and does not require any external resources such as databases, etc. If possible, make the script source available online and provide an URL to it here. Try to avoid embedding huge scripts into the report. ------------------------------------------------------------------------ [2005-02-02 07:21:44] gullevek at gullevek dot org Description: ------------ If you want to get a string length of a string with japanese kanji, then with 4.3.10 the first kanji counts 8 characters instead of 2. Any other double byte character afterwards is counted as 2 bytes. The problem is, mb_strlen should return only 1 and not 2. If I could with strlen there should be 2. I get the wrong return with all ways. With no default charset set, with default charset set, with giving charsets on the mb_strlen function, getting it via the mb_detect_encoding. It always returns the wrong length. This was not in versions before 4.3.10. ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=31805&edit=1