Hi Gustavo, Thanks for reply.
As long as bison didn't understand multibyte chars, parser would not work well with them. Your reply is exactly what I expected. Thank you for clarification. -- Yasuo Ohgaki yohg...@ohgaki.net On Thu, Nov 3, 2011 at 8:07 PM, Gustavo Lopes <glo...@nebm.ist.utl.pt> wrote: > Em Thu, 03 Nov 2011 10:31:47 -0000, Yasuo Ohgaki <yohg...@ohgaki.net> > escreveu: > >> One last quick question. >> Zend/tests/multibyte/multibyte_encoding_001.phpt sets >> mbstring.internal_encoding=SJIS. >> >> Does PHP 5.4+ suppose to work with SJIS(or other similar encoding) >> internal_encoding? >> > > No. What matters is that the parser generated by bison is able to recognize > the tokens. In an ASCII (as opposed to EBCDIC) machine, this means the > encoding must be ASCII compatible. > > This is the table for SJIS: > http://icu-project.org/icu-bin/convexp?conv=ibm-943_P15A-2003&s=ALL > > It would appear that it was ASCII compatible – \x20-\x7E represent > U+0020-U+007E, but if you take a closer look you'll see that these bytes can > also appear as part of larger sequences. > > For instance, in this script: > > <?php > function a漾() {} > > the character 漾 is represented with \xE0\x40, where \x40 represents @ in > ASCII, so this would give an error, the same this would give an error: > > <?php > function aà@() {} > > would give an error. In fact, If I save the first script as UTF-8 and then > run PHP: > > $ ./php -d zend.multibyte=1 -d zend.script_encoding=UTF-8 -d > mbstring.internal_encoding=SJIS sjis.php > php: Zend/zend_language_scanner.l:126: encoding_filter_script_to_internal: > Assertion `internal_encoding && > zend_multibyte_check_lexer_compatibility(internal_encoding)' failed. > Aborted > > it gives an assertion error. > > -- > Gustavo Lopes > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php > > -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php