Hi Matt,

Does this patch fix EOF handling issues related to mmap()? (e.g. parsing of files with size 4096, 8192, ...). Now we have two dirty fixes to handle them correctly.

The patch is quite big to understand it quickly. I'll probably take a look on weekend.

-ANY_CHAR [^\x00]
+ANY_CHAR [^]

Is [^] a correct regular expression?

Thanks. Dmitry.

Matt Wilmas wrote:
Hi Dmitry, Brian, all,

Here's a scanner patch that I mentioned awhile ago, with a possible way to work around the re2c EOF handling issues.

The primary change is to do a "manual scan" like I talked about in areas that match large amounts and can contain NULL bytes (strings/comments, which are now scanned faster too), as is done for inline HTML. I called it a "diet" :-) because it removes my complicated string regex patterns from a couple years ago, which doesn't make the .l file much smaller after adding the manual scan code (easier to understand...?), but it does result in a ~34k reduction of 5.3's generated .c file...

This fixes Bug #46817, as well as a better, more proper fix for the older Bug #42767, both related to ending comments.

Now inline HTML chunks aren't broken up when a tag starting with "s" is encountered (<script> for JS, <span>, etc.), since it's unlikely to be a long PHP <script> tag.

If an opening PHP <SCRIPT> tag was used with a capital "S", it was missed if it wasn't the first thing scanned:

var_dump(token_get_all("HTML... <SCRIPT language=php>"));

Single-line comments with a Windows newline didn't include the full \r\n:

var_dump(token_get_all("<?php // Comment\r\n?>"));

Finally, part of the optimized scanning is that, for double quoted strings, when the first variable is encountered (making it non-constant), the amount that's been scanned up to that point is remembered, which can then be skipped over (up to the variable) after returning the quote token. Previously that initial part of the string was rescanned -- the cost dependent on how far "into" the string the first var is.


I think that's about all -- I'll send another message if I forgot to mention anything... Just wanted to send this along quick for to you guys to look at or whatever. It was basically done last week, I just had to do a couple finishing touches and verify that everything was OK.

http://realplain.com/php/scanner_diet.diff (Merged changes, but didn't test yet.)
http://realplain.com/php/scanner_diet_5_3.diff


Thanks,
Matt

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to