Hi, This is one repost and you can refer to the original series via https://gcc.gnu.org/pipermail/gcc-patches/2020-January/538360.html.
As we discussed in the thread https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00196.html Original: https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00104.html, I'm working to teach IVOPTs to consider D-form group access during unrolling. The difference on D-form and other forms during unrolling is we can put the stride into displacement field to avoid additional step increment. eg: With X-form (uf step increment): ... LD A = baseA, X LD B = baseB, X ST C = baseC, X X = X + stride LD A = baseA, X LD B = baseB, X ST C = baseC, X X = X + stride LD A = baseA, X LD B = baseB, X ST C = baseC, X X = X + stride ... With D-form (one step increment for each base): ... LD A = baseA, OFF LD B = baseB, OFF ST C = baseC, OFF LD A = baseA, OFF+stride LD B = baseB, OFF+stride ST C = baseC, OFF+stride LD A = baseA, OFF+2*stride LD B = baseB, OFF+2*stride ST C = baseC, OFF+2*stride ... baseA += stride * uf baseB += stride * uf baseC += stride * uf Imagining that if the loop get unrolled by 8 times, then 3 step updates with D-form vs. 8 step updates with X-form. Here we only need to check stride meet D-form field requirement, since if OFF doesn't meet, we can construct baseA' with baseA + OFF. This patch set consists four parts: [PATCH 1/4] unroll: Add middle-end unroll factor estimation Add unroll factor estimation in middle-end. It mainly refers to current RTL unroll factor determination in function decide_unrolling and its sub calls. As Richi suggested, we probably can force unroll factor with this and avoid duplicate unroll factor calculation, but I think it need more benchmarking work and should be handled separately. [PATCH 2/4] param: Introduce one param to control unroll factor As Richard and Segher's suggestion, I used addr_offset_valid_p for the addressing mode, rather than one target hook. As Richard's suggestion, it introduces one parameter to control this IVOPTs consideration and further tweaking [3/4] on top of unroll factor estimation [1/4]. [PATCH 3/4] ivopts: Consider cost_step on different forms during unrolling Teach IVOPTs to mark the IV cand as reg_offset_p which is derived from one address IV type group where the whole group is valid to use reg_offset mode. Then scaling up the IV cand step cost by (uf - 1) for no reg_offset_p IV cands, here the uf is one estimated unroll factor [1/4]. [PATCH 4/4] rs6000: P9 D-form test cases Add some test cases, mainly copied from Kelvin's patch. This is approved by Segher if the whole series is fine. Many thanks to Richard and Segher on previous version reviews. Bootstrapped and regress tested on powerpc64le-linux-gnu. Any comments are highly appreciated! Thanks in advance! BR, Kewen ------- gcc/cfgloop.h | 3 ++ gcc/config/i386/i386-options.c | 6 +++ gcc/config/s390/s390.c | 6 +++ gcc/doc/invoke.texi | 9 +++++ gcc/params.opt | 4 ++ gcc/tree-ssa-loop-ivopts.c | 100 ++++++++++++++++++++++++++++++++++++++++++++++- gcc/tree-ssa-loop-manip.c | 253 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ gcc/tree-ssa-loop-manip.h | 3 +- gcc/tree-ssa-loop.c | 33 ++++++++++++++++ gcc/tree-ssa-loop.h | 2 + 10 files changed, 416 insertions(+), 3 deletions(-)