On Wed, 9 Jun 2021 at 02:05, Richard Henderson <richard.hender...@linaro.org> wrote: > > On 6/7/21 9:57 AM, Peter Maydell wrote: > > +#define DO_LDAVH(OP, ESIZE, TYPE, H, XCHG, EVENACC, ODDACC, TO128) \ > > + uint64_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vn, \ > > + void *vm, uint64_t a) \ > > + { \ > > + uint16_t mask = mve_element_mask(env); \ > > + unsigned e; \ > > + TYPE *n = vn, *m = vm; \ > > + Int128 acc = TO128(a); \ > > This seems to miss the << 8.
Oops, yes it does. > Which suggests that the whole thing can be done without Int128: > > > + for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) { \ > > + if (mask & 1) { \ > > + if (e & 1) { \ > > + acc = ODDACC(acc, TO128(n[H(e - 1 * XCHG)] * > > m[H(e)])); \ > > tmp = n * m; > tmp = (tmp >> 8) + ((tmp >> 7) & 1); > acc ODDACC tmp; I'm not sure about this suggestion though. It throws away all of the bottom 7 bits of the product, but because we're iterating through this 4 times and adding (potentially) four of these products together, those bottom 7 bits in the 4 products might be able to add together to become significant enough to affect the final result. -- PMM