On Mon, 6 Jan 2020, Kewen.Lin wrote:

> Hi all,
> 
> Recently I'm investigating on an issue related to use D-form/X-form vector
> memory access, it's the same as what the patch
> https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01879.html 
> was intended to deal with.  Power9 introduces DQ-form instructions for vector
> memory access, we perfer to use DQ-form when unrolling loop.  As the example
> in the above link, it can save number of ADDI and GPR for indexing.
> 
> Or for below example:
> 
>       extern void dummy (double, unsigned n);
> 
>       void
>       func (double *x, double *y, unsigned m, unsigned n)
>       {
>         double sacc;
>         for (unsigned j = 1; j < m; j++)
>         {
>           sacc = 0.0;
>           for (unsigned i = 1; i < n; i++)
>             sacc = sacc + x[i] * y[i];
>           dummy (sacc, n);
>         }
>       }
> 
> Core loop with X-form (lxvx):
> 
>       mtctr   r10
>       lxvx    vs12,r31,r9
>       lxvx    vs0,r30,r9
>       addi    r10,r9,16
>       addi    r9,r9,32
>       xvmaddadp vs32,vs12,vs0
>       lxvx    vs12,r31,r10
>       lxvx    vs0,r30,r10
>       xvmaddadp vs11,vs12,vs0
>       lxvx    vs12,r31,r9
>       lxvx    vs0,r30,r9
>       addi    r9,r10,32
>       xvmaddadp vs32,vs12,vs0
>       lxvx    vs12,r31,r9
>       lxvx    vs0,r30,r9
>       addi    r9,r10,48
>       xvmaddadp vs11,vs12,vs0
>       lxvx    vs12,r31,r9
>       lxvx    vs0,r30,r9
>       addi    r9,r10,64
>       xvmaddadp vs32,vs12,vs0
>       lxvx    vs12,r31,r9
>       lxvx    vs0,r30,r9
>       addi    r9,r10,80
>       xvmaddadp vs11,vs12,vs0
>       lxvx    vs12,r31,r9
>       lxvx    vs0,r30,r9
>       addi    r9,r10,96
>       xvmaddadp vs32,vs12,vs0
>       lxvx    vs12,r31,r9
>       lxvx    vs0,r30,r9
>       addi    r9,r10,112
>       xvmaddadp vs11,vs12,vs0
>       bdnz    190 <func+0x190>
> 
> vs.
> 
> Core loop with D-form (lxv)
>       mtctr   r8
>       lxv     vs12,0(r9)
>       lxv     vs0,0(r10)
>       addi    r7,r9,16  // r7, r8 can be eliminated further with r9, r10
>       addi    r8,r10,16 // 2 or 4 addi vs. 8 addi above
>       addi    r9,r9,128    
>       addi    r10,r10,128  
>       xvmaddadp vs32,vs12,vs0
>       lxv     vs12,-112(r9)
>       lxv     vs0,-112(r10)
>       xvmaddadp vs11,vs12,vs0
>       lxv     vs12,16(r7)
>       lxv     vs0,16(r8)
>       xvmaddadp vs32,vs12,vs0
>       lxv     vs12,32(r7)
>       lxv     vs0,32(r8)
>       xvmaddadp vs11,vs12,vs0
>       lxv     vs12,48(r7)
>       lxv     vs0,48(r8)
>       xvmaddadp vs32,vs12,vs0
>       lxv     vs12,64(r7)
>       lxv     vs0,64(r8)
>       xvmaddadp vs11,vs12,vs0
>       lxv     vs12,80(r7)
>       lxv     vs0,80(r8)
>       xvmaddadp vs32,vs12,vs0
>       lxv     vs12,96(r7)
>       lxv     vs0,96(r8)
>       xvmaddadp vs11,vs12,vs0
>       bdnz    1b0 <func+0x1b0>
> 
> We are thinking whether it can be handled in IVOPTs instead of one RTL pass.
> 
> During IVOPTs selecting IV cands, it doesn't know the loop will be unrolled so
> it doesn't count the possible step cost in with X-form.  If we can teach it to
> consider the case, the IV cands which plays with D-form can be preferred.
> Currently unrolling (incomplete) happens in RTL, it looks we have to predict
> the loop whether unroll in IVOPTs.  Since there is some parameter checks on 
> RTL
> insn counts and target hooks, it seems not easy to get that.  Besides, we need
> to check the step is valid to put into D-form field (eg: DQ-form requires 
> divide
> 16 exactly), to ensure no extra ADDIs needed.
> 
> I'm not sure whether it's a good idea to implement in IVOPTs, but I did some
> changes in IVOPTs to prove it's doable to get expected codes, the patch is 
> attached.
> 
> Any comments/suggestions are highly appreiciated!

Is the unrolled code better than the not unrolled code (assuming
optimal IV choice)?  Then IMHO IVOPTs should drive the unrolling,
either by actually doing it or by forcing it via the loop->unroll
setting.  I don't think second-guessing the RTL unroller at this
point is going to work.  Alternatively turn X-form into D-form during
RTL unrolling?

Thanks,
Richard.

> BR,
> Kewen
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Reply via email to