On 04/24/2013 12:24 PM, Martin Jambor wrote:

Here they are.  First, I simply looked at how many instructions would
be changed by a second run of the pass in its current position during
C and C++ bootstrap:

     |                                     | Insns changed |      % |
     |-------------------------------------+---------------+--------|
     | Trunk - only pass in original place |        172608 | 100.00 |
     | First pass before pro/eipilogue     |        170322 |  98.68 |
     | Second pass in the original place   |          8778 |   5.09 |

5% was worth investigating more.  The 20 source files with highest
number of affected instructions by the second run were:

       939 mine/src/libgcc/config/libbid/bid_binarydecimal.c
       909 mine/src/libgcc/config/libbid/bid128_div.c
       813 mine/src/libgcc/config/libbid/bid64_div.c
       744 mine/src/libgcc/config/libbid/bid128_compare.c
       615 mine/src/libgcc/config/libbid/bid128_to_int32.c
       480 mine/src/libgcc/config/libbid/bid128_to_int64.c
       450 mine/src/libgcc/config/libbid/bid128_to_uint32.c
       408 mine/src/libgcc/config/libbid/bid128_fma.c
       354 mine/src/libgcc/config/libbid/bid128_to_uint64.c
       327 mine/src/libgcc/config/libbid/bid128_add.c
       246 mine/src/libgcc/libgcc2.c
       141 mine/src/libgcc/config/libbid/bid_round.c
       129 mine/src/libgcc/config/libbid/bid64_mul.c
       117 mine/src/libgcc/config/libbid/bid64_to_int64.c
        96 mine/src/libsanitizer/tsan/tsan_interceptors.cc
        96 mine/src/libgcc/config/libbid/bid64_compare.c
        87 mine/src/libgcc/config/libbid/bid128_noncomp.c
        84 mine/src/libgcc/config/libbid/bid64_to_bid128.c
        81 mine/src/libgcc/config/libbid/bid64_to_uint64.c
        63 mine/src/libgcc/config/libbid/bid64_to_int32.c
The first thing that jumps out at me here is there's probably some idiom used in the BID code that is triggering.

I have manually examined some of the late opportunities for
propagation in mine/src/libgcc/config/libbid/bid_binarydecimal.c and
majority of them was a result of peephole2.
I can pretty easily see how peep2 may expose opportunities for hard-cprop. Of course, those opportunities may actually be undoing some of the benefit of the peep2 patterns.




So next time I measured only the number of instructions changed during
make stage2-bubble with multilib disabled.  In order to find out where
do the new opportunities come from, I added scheduled
pass_cprop_hardreg after every pass between
pass_branch_target_load_optimize1 and pass_fast_rtl_dce and counted
how many instructions are modified (relative to just having the pass
where it is now):
Thanks. That's a real interesting hunk of data. Interesting that we have so many after {pro,epi}logue generation, a full 33% of the changed insns stem from here and I can't think of why that should be the case. Perhaps there's some second order effect that shows itself after the first pass of cprop-hardreg.

I can see several ways jump2 could open new propagation possibilities. As I noted earlier in this message, the opportunities after peep2 may actually be doing more harm than good.

It's probably not worth the work involved, but a more sensible visitation order for reg-cprop would probably be good. Similarly we could have the capability to mark interesting blocks and just reg-cprop the interesting blocks after threading the prologue/epilogue.


I'm not sure what the conclusion is.  Probably that there are cases
where doing propagation late can be a good thing but these do not
occur that often.  And that more measurements should probably be done.
Anyway, I'll look into alternatives before (see below) pushing this
further.
Knowing more about those opportunities would be useful. The most interesting ones to me would be those right after the prologue/epilogue. Having just run the cprop, then attached the prologue/epilogue, I wouldn't expect there to be many propagation opportunities.


I have looked at the patch Vlad suggested (most things are new to me
in RTL land and so almost everything takes me ages) and I'm certainly
willing to try and mimic some of it in order to (hopefully) get the
same effect that propagating and shrink-wrapping preparation moves can
do.  Yes, this is not enough to deal with parameters loaded from stack
but unlike latest insertion, it could also work when the parameters
are also used on the fast path, which is often the case.  In fact,
propagation helps exactly because they are used in the entry BB.
Hopefully they will end up in a caller-saved register on the fast path
and we'll flip it over to the callee-saved problematic one only on
(slow) paths going through calls.

Of course, the two approaches are not mutually exclusive and load
sinking might help too.
Note that sinking copies is formulated as sink copies one at a time in Morgan's text. Not sure that's needed in this case since we're just sinking a few, well defined copies.

And I agree, the approaches are not mutually exclusive; sinking a load out of the prologue and out of a hot path has a lot of value. But sinking the loads is much more constrained than just sinking the argument copies.

jeff

Reply via email to