On 4/30/20 11:09 AM, Peter Maydell wrote: > + for (pass = 0; pass < (a->q ? 4 : 2); pass++) { > + tmp = neon_load_reg(a->vn, pass); > + tmp2 = neon_load_reg(a->vm, pass); > + abd_fn(tmp, tmp, tmp2); > + tcg_temp_free_i32(tmp2); > + tmp2 = neon_load_reg(a->vd, pass); > + add_fn(tmp, tmp, tmp2); > + tcg_temp_free_i32(tmp2); > + neon_store_reg(a->vd, pass, tmp); > + } > + return true; > +} > + > +static bool trans_VABA_S_3s(DisasContext *s, arg_3same *a) > +{ > + static NeonGenTwoOpFn * const abd_fns[] = { > + gen_helper_neon_abd_s8, > + gen_helper_neon_abd_s16, > + gen_helper_neon_abd_s32, > + }; > + static NeonGenTwoOpFn * const add_fns[] = { > + gen_helper_neon_add_u8, > + gen_helper_neon_add_u16, > + tcg_gen_add_i32, > + };
This can be packaged into one operation. E.g. static void gen_aba_s8(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m) { TCGv_i32 t = tcg_temp_new_i32(); gen_helper_neon_abd_s8(t, n, m); gen_helper_neon_add_u8(d, d, t); tcg_temp_free_i32(t);gen_aba_s8 } static const GVecGen3 op = { .fni4 = gen_aba_s8, .load_dest = true }; etc. FWIW, this is one that I've fully converted on my sve2 branch. aba(n,m,a) = max(n,m) - min(n,m) + a -- four fully vectorized operations. So anything that allows a drop-in replacement would be nice. But whatever is easiest for you. r~