Hello Alan,

  be my hero then :-) Could you generate a few tests for the multibyte
support so that we know how it is used right now and what we need to take
care of?

marcus

Monday, March 3, 2008, 12:48:44 AM, you wrote:

> Can you clarify the Multibyte issues:
> - I presume this means that it can handle ASCII/UTF8/16 etc. but will 
> not handle things like BIG5/GB encoding in source code - this may be a 
> bit of an issue around here..

> Regards
> Alan


> Marcus Boerger wrote:
>> RFC: REPLACE THE FLEX-BASED SCANNER WITH AN RE2C [1] BASED LEXER
>>
>> Situation:
>> The current flex-based lexer depends on an outdated and unsupported flex
>> version. Alternatives include either updating to a newer version of flex or
>> using re2c, which we already use for a variety of things (serializing, pdo 
>> sql
>> scanning, date/time parsing). While moving towards a newer flex version would
>> be much easier, switching to re2c promises a much faster lexer. Actually,
>> without any specific re2c optimizations we already get around a 20% scanner
>> performance increase. Running the tests gets an overall speedup of 2%. It is
>> arguable whether this is enough, but re2c has more advantages. First of all,
>> re2c allows one to scan any type of input (ASCII, UTF-8, UTF-16, UTF-32).
>> Secondly, it allows for better integration with Lemon [2], which would be the
>> next step. And thirdly we can switch to a reentrant scanner.
>>
>> Current state:
>> Flex has been fully replaced by re2c in Zend. We have also switched to an
>> mmap-based lexer approach for now. However, we had to drop multibyte support
>> as well as the encoding declare. The current state can be checked out from
>> Scott's subversion repository [3] and you can follow the development on his
>> Trac setup [4]. When you want to build php with re2c, then you need to grab
>> re2c from its sourceforge subversion repository [5]. You can also check out
>> the changes in a patch created Sunday 2nd March against a PHP checkout from 
>> 14th February [6].
>>
>> Further steps:
>> Commit this to PHP 5.3. Synch to HEAD. Add pecl/intl to 5.3. Discuss/recreate
>> multibyte support with libintl.
>>
>> Future steps:
>> Replace bison with lemon in PHP 5.4 or HEAD.
>>
>> Time Frame:
>> Commit to 5.3 between the 5th and the 15th of March. Synch to HEAD a couple
>> of days later. Moving pecl/libintl to ext (depends on the 5.3 RMs decision).
>> After that is done, decide about multibyte support. Along with the commit to
>> the 5.3 branch there will be a new re2c version available.
>>
>>
>> Marcus Boerger
>> Nuno Lopes
>> Scott MacVicar
>>
>>
>> [1] http://re2c.org/
>> [2] http://www.hwaci.com/sw/lemon/
>> [3] svn://whisky.macvicar.net/php-re2c
>> [4] http://trac.macvicar.net/php-re2c/
>> [5] https://re2c.svn.sourceforge.net/svnroot/re2c/trunk/re2c
>> [6] http://php.net/~helly/php-re2c-20080302.diff.txt
>>
>>
>>
>>   





Best regards,
 Marcus


-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to