Hi Matt,
Does this patch fix EOF handling issues related to mmap()? (e.g. parsing
of files with size 4096, 8192, ...). Now we have two dirty fixes to
handle them correctly.
The patch is quite big to understand it quickly. I'll probably take a
look on weekend.
-ANY_CHAR [^\x00]
+ANY_CHAR [^]
Is [^] a correct regular expression?
Thanks. Dmitry.
Matt Wilmas wrote:
Hi Dmitry, Brian, all,
Here's a scanner patch that I mentioned awhile ago, with a possible way
to work around the re2c EOF handling issues.
The primary change is to do a "manual scan" like I talked about in areas
that match large amounts and can contain NULL bytes (strings/comments,
which are now scanned faster too), as is done for inline HTML. I called
it a "diet" :-) because it removes my complicated string regex patterns
from a couple years ago, which doesn't make the .l file much smaller
after adding the manual scan code (easier to understand...?), but it
does result in a ~34k reduction of 5.3's generated .c file...
This fixes Bug #46817, as well as a better, more proper fix for the
older Bug #42767, both related to ending comments.
Now inline HTML chunks aren't broken up when a tag starting with "s" is
encountered (<script> for JS, <span>, etc.), since it's unlikely to be a
long PHP <script> tag.
If an opening PHP <SCRIPT> tag was used with a capital "S", it was
missed if it wasn't the first thing scanned:
var_dump(token_get_all("HTML... <SCRIPT language=php>"));
Single-line comments with a Windows newline didn't include the full \r\n:
var_dump(token_get_all("<?php // Comment\r\n?>"));
Finally, part of the optimized scanning is that, for double quoted
strings, when the first variable is encountered (making it
non-constant), the amount that's been scanned up to that point is
remembered, which can then be skipped over (up to the variable) after
returning the quote token. Previously that initial part of the string
was rescanned -- the cost dependent on how far "into" the string the
first var is.
I think that's about all -- I'll send another message if I forgot to
mention anything... Just wanted to send this along quick for to you
guys to look at or whatever. It was basically done last week, I just
had to do a couple finishing touches and verify that everything was OK.
http://realplain.com/php/scanner_diet.diff (Merged changes, but didn't
test yet.)
http://realplain.com/php/scanner_diet_5_3.diff
Thanks,
Matt
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php