On Tue, Dec 06, 2016 at 03:11:22PM +1100, David Gibson wrote: > On Mon, Dec 05, 2016 at 04:55:30PM +0530, Nikunj A Dadhania wrote: > > From: Bharata B Rao <bhar...@linux.vnet.ibm.com> > > > > xxperm: VSX Vector Permute > > xxpermr: VSX Vector Permute Right-indexed > > > > Signed-off-by: Bharata B Rao <bhar...@linux.vnet.ibm.com> > > Signed-off-by: Nikunj A Dadhania <nik...@linux.vnet.ibm.com> > > --- > > target-ppc/fpu_helper.c | 50 > > +++++++++++++++++++++++++++++++++++++ > > target-ppc/helper.h | 2 ++ > > target-ppc/translate/vsx-impl.inc.c | 2 ++ > > target-ppc/translate/vsx-ops.inc.c | 2 ++ > > 4 files changed, 56 insertions(+) > > > > diff --git a/target-ppc/fpu_helper.c b/target-ppc/fpu_helper.c > > index 3b867cf..be552c7 100644 > > --- a/target-ppc/fpu_helper.c > > +++ b/target-ppc/fpu_helper.c > > @@ -2869,3 +2869,53 @@ uint64_t helper_xsrsp(CPUPPCState *env, uint64_t xb) > > float_check_status(env); > > return xt; > > } > > + > > +static void vsr_copy_256(ppc_vsr_t *xa, ppc_vsr_t *xt, int8_t *src) > > +{ > > +#if defined(HOST_WORDS_BIGENDIAN) > > + memcpy(src, xa, sizeof(*xa)); > > + memcpy(src + 16, xt, sizeof(*xt)); > > +#else > > + memcpy(src, xt, sizeof(*xt)); > > + memcpy(src + 16, xa, sizeof(*xa)); > > Is this right? I thought the order of the bytes within each word > varied with the host endianness as well.
Since we are already working with 2 16 byte vectors xa and xb here, I thought we don't have to worry about order of bytes within each vector, but instead can construct the 32 byte vector as above based on host endianness. > > > +#endif > > +} > > + > > +static int8_t vsr_get_byte(int8_t *src, int bound, int idx) > > +{ > > + if (idx >= bound) { > > + return 0xFF; > > + } > > AFAICT you don't need this check. For both xxperm and xxpermr you're > already masking the index to 5 bits, so it can't exceed 31. Was thinking of making it a generic API and hence had that boundary check but yes, no point for the check in the context of this instruction. > > > +#if defined(HOST_WORDS_BIGENDIAN) > > + return src[idx]; > > +#else > > + return src[bound - 1 - idx]; > > +#endif > > +} > > + > > +#define VSX_XXPERM(op, indexed) \ > > +void helper_##op(CPUPPCState *env, uint32_t opcode) \ > > +{ \ > > + ppc_vsr_t xt, xa, pcv; \ > > + int i, idx; \ > > + int8_t src[32]; \ > > + \ > > + getVSR(xA(opcode), &xa, env); \ > > + getVSR(xT(opcode), &xt, env); \ > > + getVSR(xB(opcode), &pcv, env); \ > > + \ > > + vsr_copy_256(&xa, &xt, src); \ > > You have a double copy here AFAICT - first from the actual env > structure to xt and xa, then to the src array. That seems like it > would be good to avoid. > > It seems like it would nice in any case to avoid even the one copy. > You'd need a temporary for the output of course and to copy that, but > you should be able to combine indexed with host endianness to > translate each index to retrieve directly from the VSR values in env. I am not sure it would be good to retrieve byte values directly from env as getVSR nicely abstracts out from which fields (env->[fpr, vsr, avr] the data is fetched based on the register specified in the opcode. I can reduce one copy though by not constructing a 32 byte vector (src) but instead retrieving the bytes directly from xa and xt based on the index. Regards, Bharata.