Hello Alan, Andi, Rui,

  my impression still is that not a single person uses this crap. I only
hear of people claiming they have heard that people use it. But what I see
is broken code and not a single test. If this is not going to change as in
we are not getting any .phpt files for this feature then there are two
ways. First I implement something that I personally would expect and I
wouldn't care about anything that is there right now or second we simply
get rid of it completely.

So far I have extended re2c to make it easier to deal with other encodings
and even allow multiple char width at the same time. So I did my homework.
Now I expect that somebody writes tests! Then we could provide a scanner
that works on UCS-2 or on UTF-32 and then try to identofy the script
encoding. Then work on th extended charset and do a reverse encoding if
necessary for output. THough even thinking about this approach (still like
what we seem to have right now) really hurts my very badly becasue it is
the wrong approach. What we want is a working HEAD.

marcus

Monday, March 3, 2008, 4:19:24 PM, you wrote:

> a few replaces with this file should be  a good testcase
> - probably worth testing
> * comments with these character in them. both /* and //
> * string with these characters in them.
>  lynx -source 
> 'http://smontagu.damowmow.com/genEncodingTest.cgi?family=windows&codepage=950'
> | grep test | grep -v testcase

> I have definatly seen code with chinese characters in comments and 
> strings and a few times function names and variable names with chinese 
> characters...

> Regards
> Alan


> Marcus Boerger wrote:
>> Hello Alan,
>>
>>   be my hero then :-) Could you generate a few tests for the multibyte
>> support so that we know how it is used right now and what we need to take
>> care of?
>>
>> marcus
>>
>> Monday, March 3, 2008, 12:48:44 AM, you wrote:
>>
>>   
>>> Can you clarify the Multibyte issues:
>>> - I presume this means that it can handle ASCII/UTF8/16 etc. but will 
>>> not handle things like BIG5/GB encoding in source code - this may be a 
>>> bit of an issue around here..
>>>     
>>
>>   
>>> Regards
>>> Alan
>>>     
>>
>>
>>   
>>> Marcus Boerger wrote:
>>>     
>>>> RFC: REPLACE THE FLEX-BASED SCANNER WITH AN RE2C [1] BASED LEXER
>>>>
>>>> Situation:
>>>> The current flex-based lexer depends on an outdated and unsupported flex
>>>> version. Alternatives include either updating to a newer version of flex or
>>>> using re2c, which we already use for a variety of things (serializing, pdo 
>>>> sql
>>>> scanning, date/time parsing). While moving towards a newer flex version 
>>>> would
>>>> be much easier, switching to re2c promises a much faster lexer. Actually,
>>>> without any specific re2c optimizations we already get around a 20% scanner
>>>> performance increase. Running the tests gets an overall speedup of 2%. It 
>>>> is
>>>> arguable whether this is enough, but re2c has more advantages. First of 
>>>> all,
>>>> re2c allows one to scan any type of input (ASCII, UTF-8, UTF-16, UTF-32).
>>>> Secondly, it allows for better integration with Lemon [2], which would be 
>>>> the
>>>> next step. And thirdly we can switch to a reentrant scanner.
>>>>
>>>> Current state:
>>>> Flex has been fully replaced by re2c in Zend. We have also switched to an
>>>> mmap-based lexer approach for now. However, we had to drop multibyte 
>>>> support
>>>> as well as the encoding declare. The current state can be checked out from
>>>> Scott's subversion repository [3] and you can follow the development on his
>>>> Trac setup [4]. When you want to build php with re2c, then you need to grab
>>>> re2c from its sourceforge subversion repository [5]. You can also check out
>>>> the changes in a patch created Sunday 2nd March against a PHP checkout 
>>>> from 
>>>> 14th February [6].
>>>>
>>>> Further steps:
>>>> Commit this to PHP 5.3. Synch to HEAD. Add pecl/intl to 5.3. 
>>>> Discuss/recreate
>>>> multibyte support with libintl.
>>>>
>>>> Future steps:
>>>> Replace bison with lemon in PHP 5.4 or HEAD.
>>>>
>>>> Time Frame:
>>>> Commit to 5.3 between the 5th and the 15th of March. Synch to HEAD a couple
>>>> of days later. Moving pecl/libintl to ext (depends on the 5.3 RMs 
>>>> decision).
>>>> After that is done, decide about multibyte support. Along with the commit 
>>>> to
>>>> the 5.3 branch there will be a new re2c version available.
>>>>
>>>>
>>>> Marcus Boerger
>>>> Nuno Lopes
>>>> Scott MacVicar
>>>>
>>>>
>>>> [1] http://re2c.org/
>>>> [2] http://www.hwaci.com/sw/lemon/
>>>> [3] svn://whisky.macvicar.net/php-re2c
>>>> [4] http://trac.macvicar.net/php-re2c/
>>>> [5] https://re2c.svn.sourceforge.net/svnroot/re2c/trunk/re2c
>>>> [6] http://php.net/~helly/php-re2c-20080302.diff.txt
>>>>
>>>>
>>>>
>>>>   
>>>>       
>>
>>
>>
>>
>>
>> Best regards,
>>  Marcus
>>
>>   



Best regards,
 Marcus


-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to