Hello Stanislav,

  cool, care to change the code snippet into a test as I've done for Rui's
  snippet?

marcus

Sunday, March 23, 2008, 5:06:53 AM, you wrote:

>> is broken code and not a single test. If this is not going to change as in
>> we are not getting any .phpt files for this feature then there are two

> As I understand the theory of the thing should be pretty simple, you set 
> input encoding (by config or declare) and internal encoding, and then 
> when script is being read, you convert it from input to internal.
> However, it appears that since flex couldn't stomach certain encodings, 
> there's also a hack there - script is translated from input to some 
> "safe" encoding for flex, and then strings are translated back to 
> "internal" encoding after flex processes them. If re2c can deal with 
> encodings like SJIS without trouble then some of the hacks might be 
> unnecessary. I think encodings that need to be checked are those in 
> zend_multibyte.c that have "compatible" flag off.

> Here's a short script example I found that shows what's the problem there:

> <?php echo 'ソ'; ?>

> Character echoed there is U+30BD "Katakana letter SO". Now if you run it 
> in UTF-8, works good. However, if you recode it to Shift-JIS, it won't 
> run, since this script looks to the parser this way:

> <?php echo '<83>\'; ?>
> (that's dump of VI output, so replace <83> with actual 0x83 if you 
> compose it). That's parse error for the parser, if parsed "naively". So 
> somehow the parser needs to know 0x83+\ is actually U+30BD and at the 
> same time the user still might want it as 0x83+\ in a zval (or maybe as 
> utf-8 - it depends on him).
> -- 
> Stanislav Malyshev, Zend Software Architect
> [EMAIL PROTECTED]   http://www.zend.com/
> (408)253-8829   MSN: [EMAIL PROTECTED]




Best regards,
 Marcus


-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to