Hi Dmitry,

----- Original Message -----
From: "Dmitry Stogov"
Sent: Monday, August 03, 2015

Hi Matt,


On Wed, Jul 22, 2015 at 11:16 PM, Matt Wilmas <php_li...@realplain.com>
wrote:

Hi again Dmitry, all,

Hopefully the final update on this, before all is revealed... :-)

[...]

I tried to rush and finish things up before the weekend *2 weeks ago*, but
it took me too long to get the macros sorted out and working right. :-/
Sorry for the delay, but more and better goodness should now be included.
The extra time allowed me to "relax and take notes" (Notorious B.I.G.),
however. :-D

So yeah, that was all working 10 days ago.  Then I realized more function
param data could be packed together which saved another mov instruction --
so at the call site, it's just mov+lea+call on 64-bit (since execute_data
is already in %rdi).  There's nothing else (ignoring checking return
value/return on error, etc.), and each &dest variable is filled in even
though their address isn't taken (thanks to compiler magic).  The only
exceptions are FUNC (4 instructions I think) and OBJECT_OF_CLASS and
VARIADIC (1 instruction) types.

Unfortunately (only because I said "same macro syntax," but no big deal),
the syntax had to be changed, from:

ZEND_PARSE_PARAMETERS_START[_EX](...)
   Z_PARAM_*(...)
   Z_PARAM_*(...)
ZEND_PARSE_PARAMETERS_END[_EX]

to

ZEND_PARSE_PARAMETERS_START[_EX](...)(   // Parentheses
   Z_PARAM_*(...),   // Comma-separated
   Z_PARAM_*(...)
) ZEND_PARSE_PARAMETERS_END[_EX]


Errors in nested macros might be very difficult to understand :(
I would prefer not to use nested macros without a significant gain.

Not sure what you mean about errors, unless you're talking about missing a comma or such...

And those macro calls themselves aren't really nested, just in parentheses of course.

They are filling [multiple] structs, although that was also the case with the version using the EXACT current syntax. :-)

Anyway though, it doesn't matter much; not sure what you'll want to do with all the possibilities I have! And a simple script converts occurrences to the new syntax for testing (instead of bigger patch).

Significant gain? Nope. :-) I only did that in order to use the "static" storage specifier in one place, for a pointer to the packed rodata, instead of filling it at runtime. But I think the file size was the same with or without static, even though it saved instructions. So not a requirement, just part of my experiments


Like I said, the BIG neat thing is getting the same optimization (all except the "static" part) for the *traditional* ZPP. I hadn't touched it since last message until this week (doing other stuff and too sick ~4 days to do anything :-/) and wanted to check closer to final code before replying -- but still looks good with GCC so far!

So depending, there's maybe less interest in my smaller FAST_ZPP implementation... *shrug*

Overall, the *code* size is reduced (vs traditional ZPP), but the file
size isn't (static stuff in rodata or whatever), which was a bit
surprising, although most of these PHP functions don't have many
parameters...


I may just guess, where this static data came from, because I didn't see
the code yet :)

Just "static const" stuff. :-) After the very first attempt, I've wanted to pack stuff together. Function min/max args and any flags (QUIET/THROW, or the new METHOD) are in a 4 byte int. (GCC doesn't want to pack them together in the latest case, but easily fixed.) Then a byte for each parameter. So, I tried "static const" to eliminate the movb instructions, that's all.


Just to give an idea, here's the different instructions for atan2() with GCC 4.8 -O2 (after push %rbx, comments mostly for others):

== Tradtional ZPP ==
xor    %eax,%eax  # ??? align padding?
mov    %rsi,%rbx
mov    $0x61c4f3,%esi # format string ptr
sub    $0x10,%rsp
mov    0x2c(%rdi),%edi # ZEND_NUM_ARGS()
lea    0x8(%rsp),%rcx  # &num2
mov    %rsp,%rdx       # &num1
callq  595670 <zend_parse_parameters>
cmp    $0xffffffff,%eax
je     4f7f4f <zif_atan2+0x3f>
movsd  0x8(%rsp),%xmm1
movsd  (%rsp),%xmm0
callq  419190 <atan2@plt>

== My macros, "static const" version ==
mov    %rsi,%rbx
mov    $0x7709f8,%esi # packed static info ptr; execute_data in %rdi
sub    $0x20,%rsp # 16 bytes more; each parameter needs 16 bytes stack
mov    %rsp,%rdx  # &num1 AND &num2, effectively; usually "lea ?,%rdx"
callq  5935d0 <zend_fast_parse_parameters>
test   %eax,%eax  # shorter than cmp comparing with SUCCESS vs FAILURE
jne    4f6f84 <zif_atan2+0x34>
movsd  0x10(%rsp),%xmm1
movsd  (%rsp),%xmm0
callq  419330 <atan2@plt>

== Traditional ZPP, **optimized at compile time** ==
mov    $0x2,%eax  # ??? max_args, for below
mov    %rsi,%rbx
sub    $0x30,%rsp
lea    0x10(%rsp),%rdx # &num1, &num2, ..., effectively
mov    %rsp,%rsi  # packed info, filled by the following
movb   $0x2,(%rsp)    # min_args
mov    %ax,0x1(%rsp)  # max_args
movb   $0x0,0x3(%rsp) # flags (none)
movb   $0x2,0x4(%rsp) # 'd' double type: 2
movb   $0x2,0x5(%rsp) # 'd' double type: 2
callq  5935f0 <zend_fast_parse_parameters>
test   %eax,%eax
jne    4f6fa8 <zif_atan2+0x58>
movsd  0x20(%rsp),%xmm1
movsd  0x10(%rsp),%xmm0
callq  419330 <atan2@plt>

That (optimizing traditional string ZPP) will be the *equivalent* of 64KB+ of C code (repetition), all reduced to nothing. :-) And more of that should (will) be packed together. Hopefully this continues, and with other compilers, on non-Windows anyway.

Don't know about Windows now... Visual Studio 2008 and 2012 (not much difference) are NOT optimizing away the code (other times it was GCC with issues). :-/ Not sure why. Of course they don't support the necessary compound literals anyway, but I was just testing a manual case... I'll have to try and check 2015 version soon.

Regardless, there will be a fallback function to be called with optimized runtime string parsing, to be used if compilers don't create optimized code. I'll be checking more compilers, of course...


Sorry for the delay. Thought I'd have patch for you when you got back! It's really about "finished" now, but not sure how many more days of final tweaking and testing till ready for patch. :-)

Thanks. Dmitry.

- Matt


The biggest size savings actually came from the simple initial
optimization of zend_parse_params_none().  Down to almost nothing, much
faster, and saved 4KB on my --disable-all builds.


NEW GOODNESS -- What would of course be nice to have is a big optimization
of the traditional zend_parse[_method]_parameters[_ex|_throw] to avoid
changing them all.  And it seems some people, like Derick, prefer it.

Of course the obvious way I first had in mind weeks ago was to simply
parse its format string faster (once-ish) at runtime, and then feed it to
this new FAST_parse function.  Should give at least 2x speedup I figured.
But with this latest implementation, where the function should probably now
be called parse_parameters_ARRAY instead of fast_parse, it would need a
second pass after parsing the string.  Not a huge deal, but...

What would be *really nice* is to have the compiler parse the format
string, at compile time, and use the new system directly.  And... that
should be possible!! 8-)

Last week I figured GCC's "statement expressions" [1] could be used, which
most compilers seem to support, except MSVC.  But just over the weekend I
realized an inline function could be used with a compound literal (for the
varargs), which is also supported in the latest MSVC versions.  Awesome!

And again, fear not, ALL the code can be completely removed by the
compiler, leaving only movb instructions instead of lea+mov/push for the
traditional ZPP function call.  So, better than my initial
implementation(s), and nearly the same as my final macro version!  I was
just testing prototypes of portions with GCC yesterday, which does fine
after adjusting to not generate *horribly stupid* code.

Now to implement it into PHP ASAP!  Then I'll save a few more
banches/instructions in the parse function (specialized for common cases;
some useless GCC instructions), comment and clean up my experimental mess, and write up some explanation of the changes before sending patch. Oh, and
I should verify what Clang does with the code as well...

Stay tuned!

[1] https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to