is broken code and not a single test. If this is not going to change as in
we are not getting any .phpt files for this feature then there are two
As I understand the theory of the thing should be pretty simple, you set
input encoding (by config or declare) and internal encoding, and then
when script is being read, you convert it from input to internal.
However, it appears that since flex couldn't stomach certain encodings,
there's also a hack there - script is translated from input to some
"safe" encoding for flex, and then strings are translated back to
"internal" encoding after flex processes them. If re2c can deal with
encodings like SJIS without trouble then some of the hacks might be
unnecessary. I think encodings that need to be checked are those in
zend_multibyte.c that have "compatible" flag off.
Here's a short script example I found that shows what's the problem there:
<?php echo 'ソ'; ?>
Character echoed there is U+30BD "Katakana letter SO". Now if you run it
in UTF-8, works good. However, if you recode it to Shift-JIS, it won't
run, since this script looks to the parser this way:
<?php echo '<83>\'; ?>
(that's dump of VI output, so replace <83> with actual 0x83 if you
compose it). That's parse error for the parser, if parsed "naively". So
somehow the parser needs to know 0x83+\ is actually U+30BD and at the
same time the user still might want it as 0x83+\ in a zval (or maybe as
utf-8 - it depends on him).
--
Stanislav Malyshev, Zend Software Architect
[EMAIL PROTECTED] http://www.zend.com/
(408)253-8829 MSN: [EMAIL PROTECTED]
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php