Em Thu, 03 Nov 2011 10:31:47 -0000, Yasuo Ohgaki <yohg...@ohgaki.net> escreveu:

One last quick question.
Zend/tests/multibyte/multibyte_encoding_001.phpt sets
mbstring.internal_encoding=SJIS.

Does PHP 5.4+ suppose to work with SJIS(or other similar encoding)
internal_encoding?


No. What matters is that the parser generated by bison is able to recognize the tokens. In an ASCII (as opposed to EBCDIC) machine, this means the encoding must be ASCII compatible.

This is the table for SJIS:
 http://icu-project.org/icu-bin/convexp?conv=ibm-943_P15A-2003&s=ALL

It would appear that it was ASCII compatible – \x20-\x7E represent U+0020-U+007E, but if you take a closer look you'll see that these bytes can also appear as part of larger sequences.

For instance, in this script:

<?php
function a漾() {}

the character 漾 is represented with \xE0\x40, where \x40 represents @ in ASCII, so this would give an error, the same this would give an error:

<?php
function aà@() {}

would give an error. In fact, If I save the first script as UTF-8 and then run PHP:

$ ./php -d zend.multibyte=1 -d zend.script_encoding=UTF-8 -d mbstring.internal_encoding=SJIS sjis.php php: Zend/zend_language_scanner.l:126: encoding_filter_script_to_internal: Assertion `internal_encoding && zend_multibyte_check_lexer_compatibility(internal_encoding)' failed.
Aborted

it gives an assertion error.

--
Gustavo Lopes

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to