On 24/06/2015 12:04, Aurelien Jarno wrote: >> +static void gen_align(DisasContext *ctx, int opc, int rd, int rs, int rt, >> + int bp) >> { >> + TCGv t0; >> + if (rd == 0) { >> + /* Treat as NOP. */ >> + return; >> + } >> + t0 = tcg_temp_new(); >> + gen_load_gpr(t0, rt); >> + if (bp == 0) { >> + tcg_gen_mov_tl(cpu_gpr[rd], t0); >> + } else { >> + TCGv t1 = tcg_temp_new(); >> + gen_load_gpr(t1, rs); >> + switch (opc) { >> + case OPC_ALIGN: >> + { >> + TCGv_i64 t2 = tcg_temp_new_i64(); >> + tcg_gen_concat_tl_i64(t2, t1, t0); >> + tcg_gen_shri_i64(t2, t2, 8 * (4 - bp)); >> + gen_move_low32(cpu_gpr[rd], t2); >> + tcg_temp_free_i64(t2); >> + } >> + break; > > Not a problem in your patch (you basically just moved code), but I > think this implementation is incorrect. It should be the same code as > for DALIGN, but with the input operands zero extended to 32 bits, and > the result sign extended to 32 bits. Something like that should work: > > tcg_gen_ext32u_tl(t0, t0); > tcg_gen_shli_tl(t0, t0, 8 * bp); > tcg_gen_ext32u_tl(t1, t1); > tcg_gen_shri_tl(t1, t1, 8 * (4 - bp)); > tcg_gen_or_tl(cpu_gpr[rd], t1, t0); > tcg_gen_ext32s_tl(cpu_gpr[rd], cpu_gpr[rd]); > > In practice we can drop the zero extension on t0 (rt) as the bits there > will be dropped by the sign extension on the result. Note that on > 32-bit, the zero and sign extension will be dropped, so there is no need > for #ifdef TARGET_MIPS64.
I believe existing implementation is correct and does the same thing, but it operates on the whole 64-bit temp containing merged rs and rt rather than shifting 32-bit registers separately. We discussed this last year, and the potential benefit is that it could be slightly faster on 64-bit host. Thanks, Leon