arm: Convert Neon VABA 3-reg-same to decodetree

Richard Henderson Thu, 30 Apr 2020 19:30:13 -0700

On 4/30/20 11:09 AM, Peter Maydell wrote:
> +    for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
> +        tmp = neon_load_reg(a->vn, pass);
> +        tmp2 = neon_load_reg(a->vm, pass);
> +        abd_fn(tmp, tmp, tmp2);
> +        tcg_temp_free_i32(tmp2);
> +        tmp2 = neon_load_reg(a->vd, pass);
> +        add_fn(tmp, tmp, tmp2);
> +        tcg_temp_free_i32(tmp2);
> +        neon_store_reg(a->vd, pass, tmp);
> +    }
> +    return true;
> +}
> +
> +static bool trans_VABA_S_3s(DisasContext *s, arg_3same *a)
> +{
> +    static NeonGenTwoOpFn * const abd_fns[] = {
> +        gen_helper_neon_abd_s8,
> +        gen_helper_neon_abd_s16,
> +        gen_helper_neon_abd_s32,
> +    };
> +    static NeonGenTwoOpFn * const add_fns[] = {
> +        gen_helper_neon_add_u8,
> +        gen_helper_neon_add_u16,
> +        tcg_gen_add_i32,
> +    };


This can be packaged into one operation.  E.g.

static void gen_aba_s8(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m)
{
    TCGv_i32 t = tcg_temp_new_i32();

    gen_helper_neon_abd_s8(t, n, m);
    gen_helper_neon_add_u8(d, d, t);
    tcg_temp_free_i32(t);gen_aba_s8
}

static const GVecGen3 op = {
    .fni4 = gen_aba_s8,
    .load_dest = true
};

etc.

FWIW, this is one that I've fully converted on my sve2 branch.  aba(n,m,a) =
max(n,m) - min(n,m) + a -- four fully vectorized operations.  So anything that
allows a drop-in replacement would be nice.  But whatever is easiest for you.


r~

Re: [PATCH 27/36] target/arm: Convert Neon VABA 3-reg-same to decodetree

Reply via email to