Removing multibyte encoding support from PHP 5.3 will cause
the severe incompatibility problem with the older PHP 5.x.

As  Stefan noted, Shift_JIS character encoding which is widely used in
Japan is not flex safe encoding because it includes 0x5c (backslash) as
second byte of a multibyte character. BIG5 character encoding used by
Chinese is also non flex safe encoding.

Today, I committed a patch for zend multibyte support into PHP_5_3.
It is still in experimental staus because I am not an expert of re2c/flex.

A couple of test scripts is already existing in
Zend/tests/multibyte/*.phpt, but, of course, we need more test scripts
for zend multibute.
(we need to have TestFesta in Japan :)   )

The script encoding is specified by a couple of different ways.

   (1) mbstinrg.script_encoding in php.ini 
   (2) declare(encopding="Shift_JIS") on each PHP script
     ->  multibyte_encoding_001.phpt
   (3) BOM in Unicode script
      -> multibyte_encoding_00[23].phpt
   (4) auto detection based on mbstring.language,mbstring.detect_order

The test scripts are already existing for (2),(3), but nothing for
(1),(4).

I already confirmed my patch for PHP 5.3 is working for (1),(2) 
for Shift_JIS encoding. But, I didn't confirmed yet for Unicode BOM
and other encodings.

We need to have more test scripts to maintain the reliability, 
to minimize security risks.

Rui

On Tue, 24 Jun 2008 16:21:33 +0200
Stefan Esser <[EMAIL PROTECTED]> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> 
> >> This is used when reading scripts that are in encodings like Shift-JIS 
> >> which is very common in Japan. In any case, I have tried to get 
> >> involvement from some people I know over there without much success.
> > 
> > I've asked around a bit as well with our customers/partners, and all 
> > they seem to answer is "we simply use UTF-8".
> 
> It is very unlikely that anyone on internals uses Shift-JIS (EUC-xx).
> Mainly because (nearly) noone here is Japanese (Korean, Chinese).
> 
> However google for phpinfo() and you will see that zend_multibyte is
> compiled into several PHP servers. You can also google for Shift-JIS and
>   co...
> 
> The problem here is that newer Asian systems will use UTF-8 (except
> those nations using characters not possible in utf-8) and therefore the
> customers of the PHP developers (on this list) will not need that
> support. However there are many legacy systems out there who depend on
> this feature. They most probably don't know about this discussion or
> internals at all, so they cannot speak up.
> 
> If PHP 5.3 drops this feature it might close some multibyte security
> problems. However this also means that all those
> Japanese/Chinese/Korean/Taiwanese/... multibyte scripts will not run
> anymore. This forces systems to stay on PHP 5.2 which will most probably
> don't get security updates once PHP 5.3 is out of the door.
> 
> Stefan Esser
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.8 (Darwin)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> 
> iEYEARECAAYFAkhhAu0ACgkQSuF5XhWr2njCswCcDCyWnFi4jInpX+BPhmSp6ec7
> pAEAoKfDzhhpFKifgwlsn99WMwkve5bp
> =2qIJ
> -----END PGP SIGNATURE-----
> 
> -- 
> PHP Internals - PHP Runtime Development Mailing List
> To unsubscribe, visit: http://www.php.net/unsub.php

-- 
Rui Hirokawa <[EMAIL PROTECTED]>


-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to