Hey Matt,
Matt Wilmas wrote:
Hi all, Brian, not sure if you're still trying to work on other scanner changes to fix things, but wanted to say that I should have something (soon, whenever) to fix the issues with \0 in strings/comments and/or YYFILL returning too soon (maybe that's fixed, don't know).
Yep I checked in fixes for this, the scanner should now work with \0 bytes in these tokens. I believe Nuno has thoughts on what he'd like to see done here instead but it works AFAIK. (there is a highlight change that happens because of re2c changes in 5.3, but I think there may be other fixes for this as it's a non-terminated string being highlighted as a comment). btw this specific issue Lukas is bringing up has more to do with dealing with incorrect usage of MMAP to provide padding at the end of file for the scanner. (I documented this in the bug report if you're interested in more details).
Basically, my idea is to just do a "manual scan" in a few places, and take re2c out of the equation there. Would also make code smaller/simpler, and maybe a bit faster. e.g. good even if re2c is fixed.
This sounds similar to what we do for anything outside the <? ?> tokens, but (without having seen your patch of course) I feel like we should have a scanner that handles this correctly rather than having to rewrite portions in C to handle these cases. However I don't exactly have a patch ready for re2c so if this in some way improves the code as it currently stands, then so be it ;-). Perhaps you and Nuno etc should discuss this more here as I know he isn't completely satisfied with the current implementation.
Like I've said previously, I don't understand how re2c can't correctly deal with EOF. In the simplest form, if you had one re2c rule: [a-z]+ It wouldn't even match with "foo" as the input string! :-/
I'm not sure I follow why the above case wouldn't work, but my thinking here is that if EOF where treated as a unique regex expression that was conditional on YYCURSER >= YYLIMIT (or probably something more concrete). then \0 would not be the same as EOF, and it would be much easier to match for all these cases. I would worry that this could be potentioally complicated for re2c to implement, but I don't know enough about re2c and haven't heard from anyone regarding the possibility of doing this. It's my understanding that this would be essentially the same way flex opperates. (at least from the users perspective). -shire -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php