On 8/5/19 8:32 AM, Peter Maydell wrote: >> -/* load a 32-bit value from a register and perform a 64-bit accumulate. */ >> -static void gen_addq_lo(DisasContext *s, TCGv_i64 val, int rlow) >> -{ >> - TCGv_i64 tmp; >> - TCGv_i32 tmp2; >> - >> - /* Load value and extend to 64 bits. */ >> - tmp = tcg_temp_new_i64(); >> - tmp2 = load_reg(s, rlow); >> - tcg_gen_extu_i32_i64(tmp, tmp2); >> - tcg_temp_free_i32(tmp2); >> - tcg_gen_add_i64(val, val, tmp); >> - tcg_temp_free_i64(tmp); >> -} >> - > >> +static bool trans_UMAAL(DisasContext *s, arg_UMAAL *a) >> +{ >> + TCGv_i32 t0, t1, t2, zero; >> + >> + if (s->thumb >> + ? !arm_dc_feature(s, ARM_FEATURE_THUMB_DSP) >> + : !ENABLE_ARCH_6) { >> + return false; >> + } >> + >> + t0 = load_reg(s, a->rm); >> + t1 = load_reg(s, a->rn); >> + tcg_gen_mulu2_i32(t0, t1, t0, t1); >> + zero = tcg_const_i32(0); >> + t2 = load_reg(s, a->ra); >> + tcg_gen_add2_i32(t0, t1, t0, t1, t2, zero); >> + tcg_temp_free_i32(t2); >> + t2 = load_reg(s, a->rd); >> + tcg_gen_add2_i32(t0, t1, t0, t1, t2, zero); >> + tcg_temp_free_i32(t2); >> + tcg_temp_free_i32(zero); >> + store_reg(s, a->ra, t0); >> + store_reg(s, a->rd, t1); >> + return true; >> + > > Is using mulu2/add2/add2 like this really generating better > code than the mulu_i64_i32 and 2 64-bit adds that we had before? > If we're going to change how we're generating code it would be > nice to at least mention it in the commit message...
I didn't really think about the code generation difference, merely that it seemed more obvious, given that all of the inputs are i32, and we need i32 outputs. I assumed it wasn't written like this in the first place because tcg_gen_mulu2_i32 is relatively new. r~