Hi,

This is one repost and you can refer to the original series 
via https://gcc.gnu.org/pipermail/gcc-patches/2020-January/538360.html.

As we discussed in the thread
https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00196.html
Original: https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00104.html,
I'm working to teach IVOPTs to consider D-form group access during unrolling.
The difference on D-form and other forms during unrolling is we can put the
stride into displacement field to avoid additional step increment. eg:

With X-form (uf step increment):
  ...
  LD A = baseA, X
  LD B = baseB, X
  ST C = baseC, X
  X = X + stride
  LD A = baseA, X
  LD B = baseB, X
  ST C = baseC, X
  X = X + stride
  LD A = baseA, X
  LD B = baseB, X
  ST C = baseC, X
  X = X + stride
  ...

With D-form (one step increment for each base):
  ...
  LD A = baseA, OFF
  LD B = baseB, OFF
  ST C = baseC, OFF
  LD A = baseA, OFF+stride
  LD B = baseB, OFF+stride
  ST C = baseC, OFF+stride
  LD A = baseA, OFF+2*stride
  LD B = baseB, OFF+2*stride
  ST C = baseC, OFF+2*stride
  ...
  baseA += stride * uf
  baseB += stride * uf
  baseC += stride * uf

Imagining that if the loop get unrolled by 8 times, then 3 step updates with
D-form vs. 8 step updates with X-form. Here we only need to check stride
meet D-form field requirement, since if OFF doesn't meet, we can construct
baseA' with baseA + OFF.

This patch set consists four parts:
     
  [PATCH 1/4] unroll: Add middle-end unroll factor estimation

     Add unroll factor estimation in middle-end. It mainly refers to current
     RTL unroll factor determination in function decide_unrolling and its
     sub calls.  As Richi suggested, we probably can force unroll factor
     with this and avoid duplicate unroll factor calculation, but I think it
     need more benchmarking work and should be handled separately.

  [PATCH 2/4] param: Introduce one param to control unroll factor 

     As Richard and Segher's suggestion, I used addr_offset_valid_p for the
     addressing mode, rather than one target hook.  As Richard's suggestion,    
 
     it introduces one parameter to control this IVOPTs consideration and
     further tweaking [3/4] on top of unroll factor estimation [1/4].
     
  [PATCH 3/4] ivopts: Consider cost_step on different forms during unrolling

     Teach IVOPTs to mark the IV cand as reg_offset_p which is derived from
     one address IV type group where the whole group is valid to use reg_offset
     mode.  Then scaling up the IV cand step cost by (uf - 1) for no
     reg_offset_p IV cands, here the uf is one estimated unroll factor [1/4].
     
  [PATCH 4/4] rs6000: P9 D-form test cases

     Add some test cases, mainly copied from Kelvin's patch.  This is approved
     by Segher if the whole series is fine.


Many thanks to Richard and Segher on previous version reviews.

Bootstrapped and regress tested on powerpc64le-linux-gnu.

Any comments are highly appreciated!  Thanks in advance!


BR,
Kewen

-------

 gcc/cfgloop.h                  |   3 ++
 gcc/config/i386/i386-options.c |   6 +++
 gcc/config/s390/s390.c         |   6 +++
 gcc/doc/invoke.texi            |   9 +++++
 gcc/params.opt                 |   4 ++
 gcc/tree-ssa-loop-ivopts.c     | 100 
++++++++++++++++++++++++++++++++++++++++++++++-
 gcc/tree-ssa-loop-manip.c      | 253 
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 gcc/tree-ssa-loop-manip.h      |   3 +-
 gcc/tree-ssa-loop.c            |  33 ++++++++++++++++
 gcc/tree-ssa-loop.h            |   2 +
 10 files changed, 416 insertions(+), 3 deletions(-)

Reply via email to