On Wed, 6 Jul 2022 at 10:17, Richard Henderson <richard.hender...@linaro.org> wrote: > > Signed-off-by: Richard Henderson <richard.hender...@linaro.org>
> +void HELPER(sme_addha_s)(void *vzda, void *vzn, void *vpn, > + void *vpm, uint32_t desc) > +{ > + intptr_t row, col, oprsz = simd_oprsz(desc) / 4; > + uint64_t *pn = vpn, *pm = vpm; > + uint32_t *zda = vzda, *zn = vzn; > + > + for (row = 0; row < oprsz; ) { > + uint64_t pa = pn[row >> 4]; > + do { > + if (pa & 1) { > + for (col = 0; col < oprsz; ) { > + uint64_t pb = pm[col >> 4]; > + do { > + if (pb & 1) { > + zda[tile_vslice_index(row) + col] += zn[col]; Doesn't this need some H macros to handle the bigendian case? We process the predicate from architecturally least to most significant, but because zda is a uint32_t* we don't handle that in the same order I think. Similarly sme_addva_s(). > + } > + pb >>= 4; > + } while (++col & 15); > + } > + } > + pa >>= 4; > + } while (++row & 15); > + } > +} Otherwise Reviewed-by: Peter Maydell <peter.mayd...@linaro.org> thanks -- PMM