Hi again Dmitry, all,

----- Original Message -----
From: "Dmitry Stogov"
Sent: Monday, June 29, 2015

On Sat, Jun 27, 2015 at 12:36 AM, Matt Wilmas <php_li...@realplain.com>
wrote:

Hi Dmitry, all,

[...]

Yeah, I knew how the traditional ZPP worked, just wondered about any
certain "problem area." :-)  But it seems it's just the whole thing, so
much it's doing, besides just the "string format interpretation."

First, only fractions of a % old ZPP is using on WordPress now?  That
doesn't make sense to me...  On fast_zpp wiki page, you said last year it
was taking ~6% of time on Wordpress (before FAST_ZPP, of course).  And
changing key/hot functions to FAST_ZPP saved ~2.5% time.  So that should
have left a few percent of time used by traditional ZPP.

But everything else has gotten faster since then, so therefore, for an
unchanged old ZPP, its percentage contribution should have gone up? Well,
anyway...

I went ahead and tried implementing my idea (had been awhile since I
really looked at the FAST_ZPP stuff, and didn't realize it was as simple to
work from). :-)

It uses the same syntax as "FAST_ZPP" (if we/others like/prefer that) and
a zend_FAST_parse_parameters() function. Code size should be about same as
before, maybe a few more bytes depending on instructions needed (still
thinking/adjusting).

It seems to have pretty good performance increase! BTW, in quick testing, I don't see old ZPP using 90% of time even with empty/dummy function. Just
about 50% (with or without ZTS)...

I didn't know how close we could get to the inlined FAST_ZPP, but it seems
the majority of the way there: ~70% in the simple case.  (To be clear, 3x
faster than old.)  This was on Windows XP with ancient VC9.

I don't have a patch ready for you to look at yet, since I didn't finish
changing the macros, etc.

It would be awesome if this could start being used throughout the
codebase, and not just functions with preferential treatment. :-P  Maybe
you'd even switch back from the inlined version in some places, if smaller
code would be better.


Send you implementation as soon as it's ready. I'll test it.

Just an update... I didn't abandon this; quite the opposite! I thought I'd just put the finishing touches on my implementation and have it to you almost a week ago. After my rough initial test version, I made some obvious, simple changes to reduce instructions/code size (slightly). And then analyzing different stuff with GCC and MSVC to see if it could be improved more (not really since fairly straightforward), etc...

~5 days ago when I was done messing and changed the macros to recompile the existing FAST_ZPP parts, I didn't know what the size difference would be vs no FAST_ZPP (traditional). I had overestimated the savings ("maybe a few more bytes" for instructions). It was in the 30-45% range of your inlined version.

I made a change to save instructions, but, strangely, it didn't really have the effect on size I thought it would. :-/

BTW, the improvement on Linux with GCC 4.8 was about the same: ~70% of inlined. So roughly ~2/3 speed for ~1/3 space. I also finally installed Valgrind and used Callgrind for the first time. Simple. :^) About same relative reduction in instructions.

I really wanted the code size to be smaller if this could get widespread use, and started wondering, "What if...?", "How?", "Why not?", "But..."

Then I had a new idea, but wasn't sure what the compilers would do with it. So I spent Sunday prototyping a couple key parts of it outside of PHP. GCC can make a HUGE mess of it, but easily worked around. So it looks good, even better than the ideal I had imagined. Now I just have to do it for PHP...

This way saves the lea instructions for each &dest variable (like the inline version), and then some. And just earlier I realized there's a way to save the other instructions (while using the same macro syntax), which would also apply to the previous implementation.

So ideally, this means at the CALL site, we should be able to have the zend_fast_parse_... function call: Just mov+mov+lea+call on 64-bit, and that's it. The rest of the stuff (a good amount) can be COMPLETELY optimized away! :-O

And in the parse_... function, compared to the *inline* FAST_ZPP, that should get it down to about 3 dozen more instructions per parameter: while + switch + checks in zend_parse_arg_* that would get optimized away when inlined.

Well, I'll send the implementation(s) for you to test as soon as I can!

Thanks. Dmitry.

- Matt

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to