Can you clarify the Multibyte issues:
- I presume this means that it can handle ASCII/UTF8/16 etc. but will not handle things like BIG5/GB encoding in source code - this may be a bit of an issue around here..

Regards
Alan


Marcus Boerger wrote:
RFC: REPLACE THE FLEX-BASED SCANNER WITH AN RE2C [1] BASED LEXER

Situation:
The current flex-based lexer depends on an outdated and unsupported flex
version. Alternatives include either updating to a newer version of flex or
using re2c, which we already use for a variety of things (serializing, pdo sql
scanning, date/time parsing). While moving towards a newer flex version would
be much easier, switching to re2c promises a much faster lexer. Actually,
without any specific re2c optimizations we already get around a 20% scanner
performance increase. Running the tests gets an overall speedup of 2%. It is
arguable whether this is enough, but re2c has more advantages. First of all,
re2c allows one to scan any type of input (ASCII, UTF-8, UTF-16, UTF-32).
Secondly, it allows for better integration with Lemon [2], which would be the
next step. And thirdly we can switch to a reentrant scanner.

Current state:
Flex has been fully replaced by re2c in Zend. We have also switched to an
mmap-based lexer approach for now. However, we had to drop multibyte support
as well as the encoding declare. The current state can be checked out from
Scott's subversion repository [3] and you can follow the development on his
Trac setup [4]. When you want to build php with re2c, then you need to grab
re2c from its sourceforge subversion repository [5]. You can also check out
the changes in a patch created Sunday 2nd March against a PHP checkout from 14th February [6].

Further steps:
Commit this to PHP 5.3. Synch to HEAD. Add pecl/intl to 5.3. Discuss/recreate
multibyte support with libintl.

Future steps:
Replace bison with lemon in PHP 5.4 or HEAD.

Time Frame:
Commit to 5.3 between the 5th and the 15th of March. Synch to HEAD a couple
of days later. Moving pecl/libintl to ext (depends on the 5.3 RMs decision).
After that is done, decide about multibyte support. Along with the commit to
the 5.3 branch there will be a new re2c version available.


Marcus Boerger
Nuno Lopes
Scott MacVicar


[1] http://re2c.org/
[2] http://www.hwaci.com/sw/lemon/
[3] svn://whisky.macvicar.net/php-re2c
[4] http://trac.macvicar.net/php-re2c/
[5] https://re2c.svn.sourceforge.net/svnroot/re2c/trunk/re2c
[6] http://php.net/~helly/php-re2c-20080302.diff.txt





--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to