Removing multibyte encoding support from PHP 5.3 will cause the severe incompatibility problem with the older PHP 5.x.
As Stefan noted, Shift_JIS character encoding which is widely used in Japan is not flex safe encoding because it includes 0x5c (backslash) as second byte of a multibyte character. BIG5 character encoding used by Chinese is also non flex safe encoding. Today, I committed a patch for zend multibyte support into PHP_5_3. It is still in experimental staus because I am not an expert of re2c/flex. A couple of test scripts is already existing in Zend/tests/multibyte/*.phpt, but, of course, we need more test scripts for zend multibute. (we need to have TestFesta in Japan :) ) The script encoding is specified by a couple of different ways. (1) mbstinrg.script_encoding in php.ini (2) declare(encopding="Shift_JIS") on each PHP script -> multibyte_encoding_001.phpt (3) BOM in Unicode script -> multibyte_encoding_00[23].phpt (4) auto detection based on mbstring.language,mbstring.detect_order The test scripts are already existing for (2),(3), but nothing for (1),(4). I already confirmed my patch for PHP 5.3 is working for (1),(2) for Shift_JIS encoding. But, I didn't confirmed yet for Unicode BOM and other encodings. We need to have more test scripts to maintain the reliability, to minimize security risks. Rui On Tue, 24 Jun 2008 16:21:33 +0200 Stefan Esser <[EMAIL PROTECTED]> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > >> This is used when reading scripts that are in encodings like Shift-JIS > >> which is very common in Japan. In any case, I have tried to get > >> involvement from some people I know over there without much success. > > > > I've asked around a bit as well with our customers/partners, and all > > they seem to answer is "we simply use UTF-8". > > It is very unlikely that anyone on internals uses Shift-JIS (EUC-xx). > Mainly because (nearly) noone here is Japanese (Korean, Chinese). > > However google for phpinfo() and you will see that zend_multibyte is > compiled into several PHP servers. You can also google for Shift-JIS and > co... > > The problem here is that newer Asian systems will use UTF-8 (except > those nations using characters not possible in utf-8) and therefore the > customers of the PHP developers (on this list) will not need that > support. However there are many legacy systems out there who depend on > this feature. They most probably don't know about this discussion or > internals at all, so they cannot speak up. > > If PHP 5.3 drops this feature it might close some multibyte security > problems. However this also means that all those > Japanese/Chinese/Korean/Taiwanese/... multibyte scripts will not run > anymore. This forces systems to stay on PHP 5.2 which will most probably > don't get security updates once PHP 5.3 is out of the door. > > Stefan Esser > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.8 (Darwin) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iEYEARECAAYFAkhhAu0ACgkQSuF5XhWr2njCswCcDCyWnFi4jInpX+BPhmSp6ec7 > pAEAoKfDzhhpFKifgwlsn99WMwkve5bp > =2qIJ > -----END PGP SIGNATURE----- > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php -- Rui Hirokawa <[EMAIL PROTECTED]> -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php