On Thu, Sep 24, 2020 at 3:27 PM Richard Biener <richard.guent...@gmail.com> wrote: > > On Thu, Sep 24, 2020 at 10:21 AM xionghu luo <luo...@linux.ibm.com> wrote: > > > > Hi Segher, > > > > The attached two patches are updated and split from > > "[PATCH v2 2/2] rs6000: Expand vec_insert in expander instead of gimple > > [PR79251]" > > as your comments. > > > > > > [PATCH v3 2/3] rs6000: Fix lvsl&lvsr mode and change > > rs6000_expand_vector_set param > > > > This one is preparation work of fix lvsl&lvsr arg mode and > > rs6000_expand_vector_set > > parameter support for both constant and variable index input. > > > > > > [PATCH v3 2/3] rs6000: Support variable insert and Expand vec_insert in > > expander [PR79251] > > > > This one is Building VIEW_CONVERT_EXPR and expand the IFN VEC_SET to fast. > > I'll just comment that > > xxperm 34,34,33 > xxinsertw 34,0,12 > xxperm 34,34,32
Btw, on x86_64 the following produces sth reasonable: #define N 32 typedef int T; typedef T V __attribute__((vector_size(N))); V setg (V v, int idx, T val) { V valv = (V){idx, idx, idx, idx, idx, idx, idx, idx}; V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == valv); v = (v & ~mask) | (valv & mask); return v; } vmovd %edi, %xmm1 vpbroadcastd %xmm1, %ymm1 vpcmpeqd .LC0(%rip), %ymm1, %ymm2 vpblendvb %ymm2, %ymm1, %ymm0, %ymm0 ret I'm quite sure you could do sth similar on power? > doesn't look like a variable-position insert instruction but > this is a variable whole-vector rotate plus an insert at index zero > followed by a variable whole-vector rotate. I'm not fluend in > ppc assembly but > > rlwinm 6,6,2,28,29 > mtvsrwz 0,5 > lvsr 1,0,6 > lvsl 0,0,6 > > possibly computes the shift masks for r33/r32? though > I do not see those registers mentioned... > > This might be a generic viable expansion strathegy btw, > which is why I asked before whether the CPU supports > inserts at a variable position ... the building blocks are > already there with vec_set at constant zero position > plus vec_perm_const for the rotates. > > But well, I did ask this question. Multiple times. > > ppc does _not_ have a VSX instruction > like xxinsertw r34, r8, r12 where r8 denotes > the vector element (or byte position or whatever). > > So I don't think vec_set with a variable index is the > best approach. > Xionghu - you said even without the patch the stack > storage is eventually elided but > > addi 9,1,-16 > rldic 6,6,2,60 > stxv 34,-16(1) > stwx 5,9,6 > lxv 34,-16(1) > > still shows stack(?) store/load with a bad STLF penalty. > > Richard. > > > > > Thanks, > > Xionghu