vsubw

gaosong Mon, 27 Feb 2023 19:31:20 -0800


在 2023/2/28 上午2:40, Richard Henderson 写道:

On 2/27/23 02:55, gaosong wrote:
在 2023/2/25 上午3:24, Richard Henderson 写道:
         {
             .fniv = gen_vaddwev_s,
             .fno = gen_helper_vaddwev_q_d,
             .opt_opc = vecop_list,
             .vece = MO_128
         },
There are no 128-bit vector operations; you'll need to do this onedifferently.
Presumably just load the two 64-bit elements, sign-extend into128-bits, add with tcg_gen_add2_i64, and store the two 64-bitelements as output. But that won't fit into the tcg_gen_gvec_3interface.
'sign-extend into 128-bits,'   Could you give a example?
Well, for vadd, as the example we have been using:

    tcg_gen_ld_i64(lo1, cpu_env, offsetof(vector_reg[A].lo));
    tcg_gen_ld_i64(lo2, cpu_env, offsetof(vector_reg[B].lo));
    tcg_gen_sari_i64(hi1, lo1, 63);
    tcg_gen_sari_i64(hi2, lo2, 63);
    tcg_gen_add2_i64(lo1, hi1, lo1, hi1, lo2, hi2);
    tcg_gen_st_i64(lo1, cpu_env, offsetof(vector_reg[R].lo));
    tcg_gen_st_i64(hi1, cpu_env, offsetof(vector_reg[R].hi));
The middle two sari operations replicate the sign bit across theentire high word, so the pair of variables constitute a sign-extended128-bit value.

Thank you .

This is a way  to translate:

static trans_vaddwev_q_d( DisasContext *ctx, arg_vvv *a)
{
    ...
    tcg_gen_ld_i64(lo1, cpu_env, offsetof(vector_reg[A].lo));
    tcg_gen_ld_i64(lo2, cpu_env, offsetof(vector_reg[B].lo));
    tcg_gen_sari_i64(hi1, lo1, 63);
    tcg_gen_sari_i64(hi2, lo2, 63);
    tcg_gen_add2_i64(lo1, hi1, lo1, hi1, lo2, hi2);
    tcg_gen_st_i64(lo1, cpu_env, offsetof(vector_reg[R].lo));
    tcg_gen_st_i64(hi1, cpu_env, offsetof(vector_reg[R].hi));
    ...
}

I see a example at target/ppc/translate/vmx-impl.c.inc

static bool do_vx_vprtyb(DisasContext *ctx, arg_VX_tb *a,unsigned vece)

     {
             ...
             {
             .fno = gen_helper_VPRTYBQ,
             .vece = MO_128
             },

tcg_gen_gvec_2(avr_full_offset(a->vrt),avr_full_offset(a->vrb),

                                16, 16, &op[vece - MO_32]);
         return true;
     }
TRANS(VPRTYBQ, do_vx_vprtyb, MO_128)
...

do_vx_vprtyb  fit the fno into the tcg_gen_gvec_2.
I am not sure this  example is right.

Ah, well. When .fno is the only callback, the implementation isentirely out-of-line, and the .vece member is not used. I see that isconfusing.

and This is another way to translate:
    ...
         {
             .fno = gen_helper_vaddwev_q_d,
             .vece = MO_128
         },
    ...
    void HELPER(vaddwev_q_d)(void *vd, void *vj, void *vk, uint32_t v)
    {
        VReg *Vd = (VReg *)vd;
        VReg *Vj = (VReg *)vj;
        VReg *Vk = (VReg *)vk;

        Vd->Q(0) = int128_add((Int128)Vj->D(0), (Int128)Vk->D(0));
    }

These ways are can be chosen?

Thanks.
Song Gao

Re: [RFC PATCH 10/43] target/loongarch: Implement vaddw/vsubw

Reply via email to