Hi again, Hmm, not a single reply about this patch...? Did anyone try it out? :-) Think it can be used after 5.2.2?
Matt ----- Original Message ----- From: "Matt Wilmas" Sent: Thursday, April 12, 2007 Subject: [PHP-DEV] [PATCH] Major optimization for heredocs/interpolated strings > Hi all, > > I think I first realized that PHP's scanner splits non-constant strings into > many "pieces" after reading Sara's "How long is a piece of string?" blog > entry[1] last summer. At the time I didn't know much about the internals > and didn't know if anything could be done to change it. Then in the fall I > finally took a look at the scanner ;-) and thought it would be possible to > only "split" strings at variables. Finally a few months ago, I began > working out the changes -- it was working almost 2 months ago, but then I > got sidetracked :-/ from doing some more testing and making a few semantic > token changes till now. > > So anyway, now heredocs and interpolated strings should be pretty much just > like constant strings and concatenation (except for the extra INIT_STRING > opcode). They scan/parse/compile faster (with less memory), run faster, and > there's less to free when destroying opcodes. > > With a simple string like "This is $var string" (say $var = 'some'), I found > the compile/cleanup time to be up to 50% faster, and runtime 55% faster! > (Note: To test compile time, I eval()'d about 50 of them in an if (0) {...} > block.) The difference will be *much more* depending on how many "pieces" > there would've been before (e.g. longer). > > The more complex rules increased the size of Flex's tables about 40%. > However, removing the old heredoc end rule, which used the ^ > beginning-of-line operator, made the YY_RULE_SETUP macro be empty, saving > some space. The net result was an 8K/12K larger binary in 5.2/HEAD. I was > surprised at the overall performance increase without the ^ rule. Its > saving a few operations per match made just about as much difference as > Flex's -Cfe table compression (was playing with that first :^)) when > compiling the code from Zend/bench.php (5% I think). > > This was with a Windows ZTS build. Running ApacheBench on a few different > scripts showed pretty nice overall improvements -- 10-15% was common in my > quick tests. > > BTW, removing that ^ rule lifts the requirement that the character before > the closing heredoc label "must be a newline as defined by your operating > system," to quote the manual. > > Now some of the other changes: > > The ST_SINGLE_QUOTE state was removed from 5.2, like in HEAD. > > A string like "$$$" is considered constant now, since that's really what it > is, right? > > CG(zend_lineno) wasn't incremented before if a \n or \r newline (not \r\n) > followed a backslash in a non-constant string. \{ returned T_STRING instead > of T_BAD_CHARACTER like any other invalid escape sequence. (Note: Of course > these won't usually match now anyway, but will be part of a longer string.) > > I removed HANDLE_NEWLINES() from the code that scans a string's text, > instead doing the newline check in the escape-checking loop, to prevent > scanning twice. And I removed the additional boundary check in > HANDLE_NEWLINES() and elsewhere since I didn't see the need -- AFAIK in all > cases you'll only hit '\0'. > > I removed the one <<EOF>> rule since it was missing some states and it > wasn't doing anything that the default EOF rule doesn't by calling > yyterminate(). > > In zendlex(), the goto target doesn't need to recheck CG(increment_lineno) > since it hasn't changed, and I simplified the closing tag newline check > (also looked like it would miss \r ones). > > Sorry for the long message! I'll send another if I think of something I > forgot to mention. Here are the patches: > > http://realplain.com/php/scanner_optimizations.diff > http://realplain.com/php/scanner_optimizations_5_2.diff > > Appreciate any feedback, or questions about any of it. :-) > > > Thanks, > Matt > > [1] > http://blog.libssh2.org/index.php?/archives/28-How-long-is-a-piece-of-string.html -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php