2009/4/30 Scott MacVicar <scott...@php.net>: > [^] is a special case to write a portable match any character in re2c. > > Scott > > Dmitry Stogov wrote: >> Hi Matt, >> >> Does this patch fix EOF handling issues related to mmap()? (e.g. parsing >> of files with size 4096, 8192, ...). Now we have two dirty fixes to >> handle them correctly. >> >> The patch is quite big to understand it quickly. I'll probably take a >> look on weekend. >> >> -ANY_CHAR [^\x00] >> +ANY_CHAR [^] >> >> Is [^] a correct regular expression? >> >> Thanks. Dmitry. >> >> Matt Wilmas wrote: >>> Hi Dmitry, Brian, all, >>> >>> Here's a scanner patch that I mentioned awhile ago, with a possible >>> way to work around the re2c EOF handling issues. >>> >>> The primary change is to do a "manual scan" like I talked about in >>> areas that match large amounts and can contain NULL bytes >>> (strings/comments, which are now scanned faster too), as is done for >>> inline HTML. I called it a "diet" :-) because it removes my >>> complicated string regex patterns from a couple years ago, which >>> doesn't make the .l file much smaller after adding the manual scan >>> code (easier to understand...?), but it does result in a ~34k >>> reduction of 5.3's generated .c file... >>> >>> This fixes Bug #46817, as well as a better, more proper fix for the >>> older Bug #42767, both related to ending comments. >>> >>> Now inline HTML chunks aren't broken up when a tag starting with "s" >>> is encountered (<script> for JS, <span>, etc.), since it's unlikely to >>> be a long PHP <script> tag. >>> >>> If an opening PHP <SCRIPT> tag was used with a capital "S", it was >>> missed if it wasn't the first thing scanned: >>> >>> var_dump(token_get_all("HTML... <SCRIPT language=php>")); >>> >>> Single-line comments with a Windows newline didn't include the full \r\n: >>> >>> var_dump(token_get_all("<?php // Comment\r\n?>")); >>> >>> Finally, part of the optimized scanning is that, for double quoted >>> strings, when the first variable is encountered (making it >>> non-constant), the amount that's been scanned up to that point is >>> remembered, which can then be skipped over (up to the variable) after >>> returning the quote token. Previously that initial part of the string >>> was rescanned -- the cost dependent on how far "into" the string the >>> first var is. >>> >>> >>> I think that's about all -- I'll send another message if I forgot to >>> mention anything... Just wanted to send this along quick for to you >>> guys to look at or whatever. It was basically done last week, I just >>> had to do a couple finishing touches and verify that everything was OK. >>> >>> http://realplain.com/php/scanner_diet.diff (Merged changes, but didn't >>> test yet.) >>> http://realplain.com/php/scanner_diet_5_3.diff >>> >>> >>> Thanks, >>> Matt >> > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php > >
Aha - bottom of section at http://re2c.org/manual.html#lbAJ -- ----- Richard Quadling Zend Certified Engineer : http://zend.com/zce.php?c=ZEND002498&r=213474731 "Standing on the shoulders of some very clever giants!" -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php