On Thu, Sep 24, 2020 at 3:27 PM Richard Biener
<richard.guent...@gmail.com> wrote:
>
> On Thu, Sep 24, 2020 at 10:21 AM xionghu luo <luo...@linux.ibm.com> wrote:
> >
> > Hi Segher,
> >
> > The attached two patches are updated and split from
> >  "[PATCH v2 2/2] rs6000: Expand vec_insert in expander instead of gimple 
> > [PR79251]"
> > as your comments.
> >
> >
> > [PATCH v3 2/3] rs6000: Fix lvsl&lvsr mode and change 
> > rs6000_expand_vector_set param
> >
> > This one is preparation work of fix lvsl&lvsr arg mode and 
> > rs6000_expand_vector_set
> > parameter support for both constant and variable index input.
> >
> >
> > [PATCH v3 2/3] rs6000: Support variable insert and Expand vec_insert in 
> > expander [PR79251]
> >
> > This one is Building VIEW_CONVERT_EXPR and expand the IFN VEC_SET to fast.
>
> I'll just comment that
>
>         xxperm 34,34,33
>         xxinsertw 34,0,12
>         xxperm 34,34,32

Btw, on x86_64 the following produces sth reasonable:

#define N 32
typedef int T;
typedef T V __attribute__((vector_size(N)));
V setg (V v, int idx, T val)
{
  V valv = (V){idx, idx, idx, idx, idx, idx, idx, idx};
  V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == valv);
  v = (v & ~mask) | (valv & mask);
  return v;
}

        vmovd   %edi, %xmm1
        vpbroadcastd    %xmm1, %ymm1
        vpcmpeqd        .LC0(%rip), %ymm1, %ymm2
        vpblendvb       %ymm2, %ymm1, %ymm0, %ymm0
        ret

I'm quite sure you could do sth similar on power?

> doesn't look like a variable-position insert instruction but
> this is a variable whole-vector rotate plus an insert at index zero
> followed by a variable whole-vector rotate.  I'm not fluend in
> ppc assembly but
>
>         rlwinm 6,6,2,28,29
>         mtvsrwz 0,5
>         lvsr 1,0,6
>         lvsl 0,0,6
>
> possibly computes the shift masks for r33/r32?  though
> I do not see those registers mentioned...
>
> This might be a generic viable expansion strathegy btw,
> which is why I asked before whether the CPU supports
> inserts at a variable position ...  the building blocks are
> already there with vec_set at constant zero position
> plus vec_perm_const for the rotates.
>
> But well, I did ask this question.  Multiple times.
>
> ppc does _not_ have a VSX instruction
> like xxinsertw r34, r8, r12 where r8 denotes
> the vector element (or byte position or whatever).
>
> So I don't think vec_set with a variable index is the
> best approach.
> Xionghu - you said even without the patch the stack
> storage is eventually elided but
>
>         addi 9,1,-16
>         rldic 6,6,2,60
>         stxv 34,-16(1)
>         stwx 5,9,6
>         lxv 34,-16(1)
>
> still shows stack(?) store/load with a bad STLF penalty.
>
> Richard.
>
> >
> > Thanks,
> > Xionghu

Reply via email to