Hi! On Thu, Sep 24, 2020 at 03:27:48PM +0200, Richard Biener wrote: > On Thu, Sep 24, 2020 at 10:21 AM xionghu luo <luo...@linux.ibm.com> wrote: > I'll just comment that > > xxperm 34,34,33 > xxinsertw 34,0,12 > xxperm 34,34,32 > > doesn't look like a variable-position insert instruction but > this is a variable whole-vector rotate plus an insert at index zero > followed by a variable whole-vector rotate. I'm not fluend in > ppc assembly but > > rlwinm 6,6,2,28,29 > mtvsrwz 0,5 > lvsr 1,0,6 > lvsl 0,0,6 > > possibly computes the shift masks for r33/r32? though > I do not see those registers mentioned...
v0/v1 (what the lvs[lr] write to) are the same as vs32/vs33. The low half of the VSRs (vector-scalar registers) are the FP registers (expanded to 16B each), and the high half are the original VRs (vector registers). AltiVec insns (like lvsl, lvsr) naturally only work on VRs, as do some newer insns for which there wasn't enough budget in the opcode space to have for VSRs (which take 6 bits each, while VRs take only 5, just like FPRs and GPRs). > This might be a generic viable expansion strathegy btw, > which is why I asked before whether the CPU supports > inserts at a variable position ... ISA 3.1 (Power10) supports variable position inserts. Power9 supports fixed position inserts. Older CPUs can of course construct it some other way. > ppc does _not_ have a VSX instruction > like xxinsertw r34, r8, r12 where r8 denotes > the vector element (or byte position or whatever). vins[bhwd][v][lr]x does this. Those are Power10 instructions. Segher