On Thu, Sep 24, 2020 at 10:21 AM xionghu luo <luo...@linux.ibm.com> wrote: > > Hi Segher, > > The attached two patches are updated and split from > "[PATCH v2 2/2] rs6000: Expand vec_insert in expander instead of gimple > [PR79251]" > as your comments. > > > [PATCH v3 2/3] rs6000: Fix lvsl&lvsr mode and change rs6000_expand_vector_set > param > > This one is preparation work of fix lvsl&lvsr arg mode and > rs6000_expand_vector_set > parameter support for both constant and variable index input. > > > [PATCH v3 2/3] rs6000: Support variable insert and Expand vec_insert in > expander [PR79251] > > This one is Building VIEW_CONVERT_EXPR and expand the IFN VEC_SET to fast.
I'll just comment that xxperm 34,34,33 xxinsertw 34,0,12 xxperm 34,34,32 doesn't look like a variable-position insert instruction but this is a variable whole-vector rotate plus an insert at index zero followed by a variable whole-vector rotate. I'm not fluend in ppc assembly but rlwinm 6,6,2,28,29 mtvsrwz 0,5 lvsr 1,0,6 lvsl 0,0,6 possibly computes the shift masks for r33/r32? though I do not see those registers mentioned... This might be a generic viable expansion strathegy btw, which is why I asked before whether the CPU supports inserts at a variable position ... the building blocks are already there with vec_set at constant zero position plus vec_perm_const for the rotates. But well, I did ask this question. Multiple times. ppc does _not_ have a VSX instruction like xxinsertw r34, r8, r12 where r8 denotes the vector element (or byte position or whatever). So I don't think vec_set with a variable index is the best approach. Xionghu - you said even without the patch the stack storage is eventually elided but addi 9,1,-16 rldic 6,6,2,60 stxv 34,-16(1) stwx 5,9,6 lxv 34,-16(1) still shows stack(?) store/load with a bad STLF penalty. Richard. > > Thanks, > Xionghu