arm: Convert multiply and multiply accumulate

Richard Henderson Mon, 05 Aug 2019 09:20:46 -0700

On 8/5/19 8:32 AM, Peter Maydell wrote:
>> -/* load a 32-bit value from a register and perform a 64-bit accumulate.  */
>> -static void gen_addq_lo(DisasContext *s, TCGv_i64 val, int rlow)
>> -{
>> -    TCGv_i64 tmp;
>> -    TCGv_i32 tmp2;
>> -
>> -    /* Load value and extend to 64 bits.  */
>> -    tmp = tcg_temp_new_i64();
>> -    tmp2 = load_reg(s, rlow);
>> -    tcg_gen_extu_i32_i64(tmp, tmp2);
>> -    tcg_temp_free_i32(tmp2);
>> -    tcg_gen_add_i64(val, val, tmp);
>> -    tcg_temp_free_i64(tmp);
>> -}
>> -
> 
>> +static bool trans_UMAAL(DisasContext *s, arg_UMAAL *a)
>> +{
>> +    TCGv_i32 t0, t1, t2, zero;
>> +
>> +    if (s->thumb
>> +        ? !arm_dc_feature(s, ARM_FEATURE_THUMB_DSP)
>> +        : !ENABLE_ARCH_6) {
>> +        return false;
>> +    }
>> +
>> +    t0 = load_reg(s, a->rm);
>> +    t1 = load_reg(s, a->rn);
>> +    tcg_gen_mulu2_i32(t0, t1, t0, t1);
>> +    zero = tcg_const_i32(0);
>> +    t2 = load_reg(s, a->ra);
>> +    tcg_gen_add2_i32(t0, t1, t0, t1, t2, zero);
>> +    tcg_temp_free_i32(t2);
>> +    t2 = load_reg(s, a->rd);
>> +    tcg_gen_add2_i32(t0, t1, t0, t1, t2, zero);
>> +    tcg_temp_free_i32(t2);
>> +    tcg_temp_free_i32(zero);
>> +    store_reg(s, a->ra, t0);
>> +    store_reg(s, a->rd, t1);
>> +    return true;
>> +
> 
> Is using mulu2/add2/add2 like this really generating better
> code than the mulu_i64_i32 and 2 64-bit adds that we had before?
> If we're going to change how we're generating code it would be
> nice to at least mention it in the commit message...


I didn't really think about the code generation difference, merely that it
seemed more obvious, given that all of the inputs are i32, and we need i32
outputs.  I assumed it wasn't written like this in the first place because
tcg_gen_mulu2_i32 is relatively new.


r~

Re: [Qemu-devel] [PATCH 14/67] target/arm: Convert multiply and multiply accumulate

Reply via email to