on 2020/4/17 下午7:32, Richard Biener wrote: > On Fri, Apr 17, 2020 at 1:10 PM Kewen.Lin via Gcc <gcc@gcc.gnu.org> wrote: >> >> Hi all, >> >> This is one question origining from PR61837, which shows that doloop >> pass could introduce some loop invariants against its outer loop when >> doloop performs and prepares the iteration count for hardware >> count register assignment. >> > I suggest to try both approaches and count the number of transforms > done in each instance. > > Richard. >
Hi Richi, Thanks for the suggestion, some statistics were collected as below. A: default B: move pass_rtl_move_loop_invariants after pass_rtl_doloop C: rerun pass_rtl_move_loop_invariants after pass_rtl_doloop D: C + doloop_begin Ran by bootstrapping and regression testing on ppc64le Power8 configured with languages c,c++,fortran,objc,obj-c++,go. Counting move #transformations in function move_invariant_reg (before label fail, probably better with inv == repr to filter out those replacements with rep, but I think the trend is similar?). A: 802650 B: 841476 C: 803485 (C1) + 827883 (C2) D: 802622 (D1) + 841476 (D2) Let's call pass_rtl_move_loop_invariants as hoisting. PS: C1/D1 for 1st time hoisting while C2/D2 for 2nd time hoisting. The small differences (~0.1%) among A/C1/D1 should be caused by noise. The numbers with twice runs (C/D) are almost two times of one time run, which surprised me. By further investigation, it looks the current pass_rtl_move_loop_invariants has something to be improved if we want to rerun it. Taking gcc/testsuite/gfortran.dg/inline_matmul_16.f90 at -O1 as example. C1 does 178 transforms and C2 does 165, it's unrelated to unroll/doloop passes, this result isn't changed by disabling them explicitly. Currently, without flag_ira_loop_pressure, the regs_used estimation isn't good, I'd expect that invs which are hoisted first time from the loop should be counted as regs_used next time at regs_used analysis. By checking the regs_used, it's set as 2 for all loops of case inline_matmul_16, either C1 or C2. I think it leads the 2nd hoisting optimistically estimate register pressure and hoist more. By simple hacking by considering 1st hoisting new_reg, I can see the 2nd hoisting has fewer moves (57). It means the above statistics comparison is unfair and unreliable. With flag_ira_loop_pressure, the #transforms become to 255 (1st) and 68 (2nd), it looks better but might also need more enhancements? Since rs6000 sets flag_ira_loop_pressure at O3, I did SPEC2017 performance evaluation on Power8 (against baseline A) with option -Ofast -funroll-loops: * B showed 525.x264_r +1.43%, 538.imagick_r +1.23% speedup but 503.bwaves_r -2.74% degradation. * C showed 500.perlbench_r -1.31%, 520.omnetpp_r -2.20% degradation. The evaluation shows running hoisting after doloop can give us some benefits, but to rerun it twice isn't able to give us the similar gains. It looks regardless of flag_ira_loop_pressure, to rerun the pass requires more tweaks, probably considering those related parameters. If go with B, we need to figure out what we miss forbwaves_r. BR, Kewen