Hey Matt,

Matt Wilmas wrote:
Hi all,

Brian, not sure if you're still trying to work on other scanner changes
to fix things, but wanted to say that I should have something (soon,
whenever) to fix the issues with \0 in strings/comments and/or YYFILL
returning too soon (maybe that's fixed, don't know).

Yep I checked in fixes for this, the scanner should now work with \0 bytes in 
these tokens.  I believe Nuno has thoughts on what he'd like to see done here 
instead but it works AFAIK. (there is a highlight change that happens because 
of re2c changes in 5.3, but I think there may be other fixes for this as it's a 
non-terminated string being highlighted as a comment).

btw this specific issue Lukas is bringing up has more to do with dealing with 
incorrect usage of MMAP to provide padding at the end of file for the scanner. 
(I documented this in the bug report if you're interested in more details).

Basically, my idea
is to just do a "manual scan" in a few places, and take re2c out of the
equation there. Would also make code smaller/simpler, and maybe a bit
faster. e.g. good even if re2c is fixed.

This sounds similar to what we do for anything outside the <? ?> tokens, but 
(without having seen your patch of course) I feel like we should have a scanner that 
handles this correctly rather than having to rewrite portions in C to handle these 
cases.  However I don't exactly have a patch ready for re2c so if this in some way 
improves the code as it currently stands, then so be it ;-).  Perhaps you and Nuno 
etc should discuss this more here as I know he isn't completely satisfied with the 
current implementation.

Like I've said previously, I don't understand how re2c can't correctly
deal with EOF. In the simplest form, if you had one re2c rule:

[a-z]+

It wouldn't even match with "foo" as the input string! :-/

I'm not sure I follow why the above case wouldn't work, but my thinking here is 
that if EOF where treated as a unique regex expression that was conditional on 
YYCURSER >= YYLIMIT (or probably something more concrete).  then \0 would not 
be the same as EOF, and it would be much easier to match for all these cases.  I 
would worry that this could be potentioally complicated for re2c to implement, but 
I don't know enough about re2c and haven't heard from anyone regarding the 
possibility of doing this.   It's my understanding that this would be essentially 
the same way flex opperates. (at least from the users perspective).

-shire

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to