Re: [PATCH, PR 10474] Shedule pass_cprop_hardreg before pass_thread_prologue_and_epilogue

Jeff Law Wed, 24 Apr 2013 12:29:55 -0700

On 04/24/2013 12:24 PM, Martin Jambor wrote:


Here they are.  First, I simply looked at how many instructions would
be changed by a second run of the pass in its current position during
C and C++ bootstrap:

     |                                     | Insns changed |      % |
     |-------------------------------------+---------------+--------|
     | Trunk - only pass in original place |        172608 | 100.00 |
     | First pass before pro/eipilogue     |        170322 |  98.68 |
     | Second pass in the original place   |          8778 |   5.09 |

5% was worth investigating more.  The 20 source files with highest
number of affected instructions by the second run were:

       939 mine/src/libgcc/config/libbid/bid_binarydecimal.c
       909 mine/src/libgcc/config/libbid/bid128_div.c
       813 mine/src/libgcc/config/libbid/bid64_div.c
       744 mine/src/libgcc/config/libbid/bid128_compare.c
       615 mine/src/libgcc/config/libbid/bid128_to_int32.c
       480 mine/src/libgcc/config/libbid/bid128_to_int64.c
       450 mine/src/libgcc/config/libbid/bid128_to_uint32.c
       408 mine/src/libgcc/config/libbid/bid128_fma.c
       354 mine/src/libgcc/config/libbid/bid128_to_uint64.c
       327 mine/src/libgcc/config/libbid/bid128_add.c
       246 mine/src/libgcc/libgcc2.c
       141 mine/src/libgcc/config/libbid/bid_round.c
       129 mine/src/libgcc/config/libbid/bid64_mul.c
       117 mine/src/libgcc/config/libbid/bid64_to_int64.c
        96 mine/src/libsanitizer/tsan/tsan_interceptors.cc
        96 mine/src/libgcc/config/libbid/bid64_compare.c
        87 mine/src/libgcc/config/libbid/bid128_noncomp.c
        84 mine/src/libgcc/config/libbid/bid64_to_bid128.c
        81 mine/src/libgcc/config/libbid/bid64_to_uint64.c
        63 mine/src/libgcc/config/libbid/bid64_to_int32.c

The first thing that jumps out at me here is there's probably some idiomused in the BID code that is triggering.

I have manually examined some of the late opportunities for
propagation in mine/src/libgcc/config/libbid/bid_binarydecimal.c and
majority of them was a result of peephole2.

I can pretty easily see how peep2 may expose opportunities forhard-cprop. Of course, those opportunities may actually be undoing someof the benefit of the peep2 patterns.


So next time I measured only the number of instructions changed during
make stage2-bubble with multilib disabled.  In order to find out where
do the new opportunities come from, I added scheduled
pass_cprop_hardreg after every pass between
pass_branch_target_load_optimize1 and pass_fast_rtl_dce and counted
how many instructions are modified (relative to just having the pass
where it is now):

Thanks. That's a real interesting hunk of data. Interesting that wehave so many after {pro,epi}logue generation, a full 33% of the changedinsns stem from here and I can't think of why that should be the case.Perhaps there's some second order effect that shows itself after thefirst pass of cprop-hardreg.

I can see several ways jump2 could open new propagation possibilities.As I noted earlier in this message, the opportunities after peep2 mayactually be doing more harm than good.

It's probably not worth the work involved, but a more sensiblevisitation order for reg-cprop would probably be good. Similarly wecould have the capability to mark interesting blocks and just reg-cpropthe interesting blocks after threading the prologue/epilogue.


I'm not sure what the conclusion is.  Probably that there are cases
where doing propagation late can be a good thing but these do not
occur that often.  And that more measurements should probably be done.
Anyway, I'll look into alternatives before (see below) pushing this
further.

Knowing more about those opportunities would be useful. The mostinteresting ones to me would be those right after the prologue/epilogue.Having just run the cprop, then attached the prologue/epilogue, Iwouldn't expect there to be many propagation opportunities.


I have looked at the patch Vlad suggested (most things are new to me
in RTL land and so almost everything takes me ages) and I'm certainly
willing to try and mimic some of it in order to (hopefully) get the
same effect that propagating and shrink-wrapping preparation moves can
do.  Yes, this is not enough to deal with parameters loaded from stack
but unlike latest insertion, it could also work when the parameters
are also used on the fast path, which is often the case.  In fact,
propagation helps exactly because they are used in the entry BB.
Hopefully they will end up in a caller-saved register on the fast path
and we'll flip it over to the callee-saved problematic one only on
(slow) paths going through calls.

Of course, the two approaches are not mutually exclusive and load
sinking might help too.

Note that sinking copies is formulated as sink copies one at a time inMorgan's text. Not sure that's needed in this case since we're justsinking a few, well defined copies.

And I agree, the approaches are not mutually exclusive; sinking a loadout of the prologue and out of a hot path has a lot of value. Butsinking the loads is much more constrained than just sinking theargument copies.


jeff

Re: [PATCH, PR 10474] Shedule pass_cprop_hardreg before pass_thread_prologue_and_epilogue

Reply via email to