Hello Alan, be my hero then :-) Could you generate a few tests for the multibyte support so that we know how it is used right now and what we need to take care of?
marcus Monday, March 3, 2008, 12:48:44 AM, you wrote: > Can you clarify the Multibyte issues: > - I presume this means that it can handle ASCII/UTF8/16 etc. but will > not handle things like BIG5/GB encoding in source code - this may be a > bit of an issue around here.. > Regards > Alan > Marcus Boerger wrote: >> RFC: REPLACE THE FLEX-BASED SCANNER WITH AN RE2C [1] BASED LEXER >> >> Situation: >> The current flex-based lexer depends on an outdated and unsupported flex >> version. Alternatives include either updating to a newer version of flex or >> using re2c, which we already use for a variety of things (serializing, pdo >> sql >> scanning, date/time parsing). While moving towards a newer flex version would >> be much easier, switching to re2c promises a much faster lexer. Actually, >> without any specific re2c optimizations we already get around a 20% scanner >> performance increase. Running the tests gets an overall speedup of 2%. It is >> arguable whether this is enough, but re2c has more advantages. First of all, >> re2c allows one to scan any type of input (ASCII, UTF-8, UTF-16, UTF-32). >> Secondly, it allows for better integration with Lemon [2], which would be the >> next step. And thirdly we can switch to a reentrant scanner. >> >> Current state: >> Flex has been fully replaced by re2c in Zend. We have also switched to an >> mmap-based lexer approach for now. However, we had to drop multibyte support >> as well as the encoding declare. The current state can be checked out from >> Scott's subversion repository [3] and you can follow the development on his >> Trac setup [4]. When you want to build php with re2c, then you need to grab >> re2c from its sourceforge subversion repository [5]. You can also check out >> the changes in a patch created Sunday 2nd March against a PHP checkout from >> 14th February [6]. >> >> Further steps: >> Commit this to PHP 5.3. Synch to HEAD. Add pecl/intl to 5.3. Discuss/recreate >> multibyte support with libintl. >> >> Future steps: >> Replace bison with lemon in PHP 5.4 or HEAD. >> >> Time Frame: >> Commit to 5.3 between the 5th and the 15th of March. Synch to HEAD a couple >> of days later. Moving pecl/libintl to ext (depends on the 5.3 RMs decision). >> After that is done, decide about multibyte support. Along with the commit to >> the 5.3 branch there will be a new re2c version available. >> >> >> Marcus Boerger >> Nuno Lopes >> Scott MacVicar >> >> >> [1] http://re2c.org/ >> [2] http://www.hwaci.com/sw/lemon/ >> [3] svn://whisky.macvicar.net/php-re2c >> [4] http://trac.macvicar.net/php-re2c/ >> [5] https://re2c.svn.sourceforge.net/svnroot/re2c/trunk/re2c >> [6] http://php.net/~helly/php-re2c-20080302.diff.txt >> >> >> >> Best regards, Marcus -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php