Hello Matt, the patch looks interesting. I think we should commit it to HEAD. And if it works good we can add it to 5.3 once we created it. Did you do any measurements?
best regards marcus Thursday, April 26, 2007, 5:52:43 AM, you wrote: > Hi again, > Hmm, not a single reply about this patch...? Did anyone try it out? :-) > Think it can be used after 5.2.2? > Matt > ----- Original Message ----- > From: "Matt Wilmas" > Sent: Thursday, April 12, 2007 > Subject: [PHP-DEV] [PATCH] Major optimization for heredocs/interpolated > strings >> Hi all, >> >> I think I first realized that PHP's scanner splits non-constant strings > into >> many "pieces" after reading Sara's "How long is a piece of string?" blog >> entry[1] last summer. At the time I didn't know much about the internals >> and didn't know if anything could be done to change it. Then in the fall > I >> finally took a look at the scanner ;-) and thought it would be possible to >> only "split" strings at variables. Finally a few months ago, I began >> working out the changes -- it was working almost 2 months ago, but then I >> got sidetracked :-/ from doing some more testing and making a few semantic >> token changes till now. >> >> So anyway, now heredocs and interpolated strings should be pretty much > just >> like constant strings and concatenation (except for the extra INIT_STRING >> opcode). They scan/parse/compile faster (with less memory), run faster, > and >> there's less to free when destroying opcodes. >> >> With a simple string like "This is $var string" (say $var = 'some'), I > found >> the compile/cleanup time to be up to 50% faster, and runtime 55% faster! >> (Note: To test compile time, I eval()'d about 50 of them in an if (0) > {...} >> block.) The difference will be *much more* depending on how many "pieces" >> there would've been before (e.g. longer). >> >> The more complex rules increased the size of Flex's tables about 40%. >> However, removing the old heredoc end rule, which used the ^ >> beginning-of-line operator, made the YY_RULE_SETUP macro be empty, saving >> some space. The net result was an 8K/12K larger binary in 5.2/HEAD. I > was >> surprised at the overall performance increase without the ^ rule. Its >> saving a few operations per match made just about as much difference as >> Flex's -Cfe table compression (was playing with that first :^)) when >> compiling the code from Zend/bench.php (5% I think). >> >> This was with a Windows ZTS build. Running ApacheBench on a few different >> scripts showed pretty nice overall improvements -- 10-15% was common in my >> quick tests. >> >> BTW, removing that ^ rule lifts the requirement that the character before >> the closing heredoc label "must be a newline as defined by your operating >> system," to quote the manual. >> >> Now some of the other changes: >> >> The ST_SINGLE_QUOTE state was removed from 5.2, like in HEAD. >> >> A string like "$$$" is considered constant now, since that's really what > it >> is, right? >> >> CG(zend_lineno) wasn't incremented before if a \n or \r newline (not \r\n) >> followed a backslash in a non-constant string. \{ returned T_STRING > instead >> of T_BAD_CHARACTER like any other invalid escape sequence. (Note: Of > course >> these won't usually match now anyway, but will be part of a longer > string.) >> >> I removed HANDLE_NEWLINES() from the code that scans a string's text, >> instead doing the newline check in the escape-checking loop, to prevent >> scanning twice. And I removed the additional boundary check in >> HANDLE_NEWLINES() and elsewhere since I didn't see the need -- AFAIK in > all >> cases you'll only hit '\0'. >> >> I removed the one <<EOF>> rule since it was missing some states and it >> wasn't doing anything that the default EOF rule doesn't by calling >> yyterminate(). >> >> In zendlex(), the goto target doesn't need to recheck CG(increment_lineno) >> since it hasn't changed, and I simplified the closing tag newline check >> (also looked like it would miss \r ones). >> >> Sorry for the long message! I'll send another if I think of something I >> forgot to mention. Here are the patches: >> >> http://realplain.com/php/scanner_optimizations.diff >> http://realplain.com/php/scanner_optimizations_5_2.diff >> >> Appreciate any feedback, or questions about any of it. :-) >> >> >> Thanks, >> Matt >> >> [1] >> > http://blog.libssh2.org/index.php?/archives/28-How-long-is-a-piece-of-string.html Best regards, Marcus -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php