On 06/25/2018 09:13 AM, Peter Maydell wrote: > On 21 June 2018 at 02:53, Richard Henderson > <richard.hender...@linaro.org> wrote: >> Signed-off-by: Richard Henderson <richard.hender...@linaro.org> >> --- >> target/arm/helper-sve.h | 41 +++++++++++++++++++++ >> target/arm/sve_helper.c | 62 ++++++++++++++++++++++++++++++++ >> target/arm/translate-sve.c | 74 ++++++++++++++++++++++++++++++++++++++ >> target/arm/sve.decode | 39 ++++++++++++++++++++ >> 4 files changed, 216 insertions(+) >> +/* Stores with a vector index. */ >> + >> +#define DO_ST1_ZPZ_S(NAME, TYPEI, FN) \ >> +void HELPER(NAME)(CPUARMState *env, void *vd, void *vg, void *vm, \ >> + target_ulong base, uint32_t desc) \ >> +{ \ >> + intptr_t i, oprsz = simd_oprsz(desc) / 8; \ >> + unsigned scale = simd_data(desc); \ >> + uintptr_t ra = GETPC(); \ >> + uint32_t *d = vd; TYPEI *m = vm; uint8_t *pg = vg; \ >> + for (i = 0; i < oprsz; i++) { \ >> + uint8_t pp = pg[H1(i)]; \ >> + if (pp & 0x01) { \ >> + target_ulong off = (target_ulong)m[H4(i * 2)] << scale; \ >> + FN(env, base + off, d[H4(i * 2)], ra); \ >> + } \ >> + if (pp & 0x10) { \ >> + target_ulong off = (target_ulong)m[H4(i * 2 + 1)] << scale; \ >> + FN(env, base + off, d[H4(i * 2 + 1)], ra); \ >> + } \ >> + } \ >> +} > > Why do we do two operations per loop here? Generally > we seem to do one operation per loop elsewhere.
I'm not sure why I wrote this one in this way. There doesn't seem to be a good reason. >> +static bool trans_ST1_zprz(DisasContext *s, arg_ST1_zprz *a, uint32_t insn) >> +{ >> + /* Indexed by [xs][msz]. */ >> + static gen_helper_gvec_mem_scatter * const fn32[2][3] = { >> + { gen_helper_sve_stbs_zsu, >> + gen_helper_sve_sths_zsu, >> + gen_helper_sve_stss_zsu, }, >> + { gen_helper_sve_stbs_zss, >> + gen_helper_sve_sths_zss, >> + gen_helper_sve_stss_zss, }, >> + }; >> + static gen_helper_gvec_mem_scatter * const fn64[3][4] = { > > In the pseudocode xs is either 0 (zero-extend offset) or > 1 (sign-extend offset), but here it can also be 2. A > comment noting that we've overloaded it to also indicate > whether we're dealing with a 32-bit or 64-bit offset might > help (at least I think that's what we're doing). Yes, xs=2 is overloaded to mean 64-bit offset. I'll add comments. r~