On 05/04/2011 09:39 AM, Max Filippov wrote: > To track immediate values written to SAR? You mean that there may be > some performance difference of fixed size shift vs indirect shift and > TCG is able to tell them apart?
Well, not really fixed vs indirect, but if you know that the value in the SAR register is in the right range, you can avoid using a 64-bit shift. For instance, SSL ar2 SLL ar0, ar1 could be implemented with tcg_gen_sll_i32(ar0, ar1, ar2); assuming we have enough context. Let us decompose the SAR register into two parts, storing both the true value, and 32-value. struct DisasContext { // Current Stuff // ... // When valid, holds 32-SAR. TCGv sar_m32; bool sar_m32_alloc; bool sar_m32_valid; bool sar_5bit; }; At the beginning of the TB: TCGV_UNUSED_I32(dc->sar_m32); dc->sar_m32_alloc = false; dc->sar_m32_valid = false; dc->sar_5bit = false; static void gen_set_sra_m32(DisasContext *dc, TCGv val) { if (!dc->sar_m32_alloc) { dc->sar_m32_alloc = true; dc->sar_m32 = tcg_temp_local_new_i32(); } dc->sar_m32_valid = true; /* Clear 5 bit because the SAR value could be 32. */ dc->sar_5bit = false; tcg_gen_movi_i32(cpu_SR[SAR], 32); tcg_gen_sub_i32(cpu_SR[SAR], cpu_SR[SAR], val); tcg_gen_mov_i32(dc->sar_m32, val); } static void gen_set_sra(DisasContext *dc, TCGv val, bool is_5bit) { if (dc->sar_m32_alloc && dc->sar_m32_valid) { tcg_gen_discard_i32(dc->sar_m32); } dc->sar_m32_valid = false; dc->sar_5bit = is_5bit; tcg_gen_mov_i32(cpu_SR[SAR], val); } /* SSL */ tcg_gen_andi_i32(tmp, cpu_R[AS], 31); gen_set_sra_m32(dc, tmp); break; /* SRL */ tcg_gen_andi_i32(tmp, cpu_R[AS], 31); gen_set_sra(dc, tmp, true); break; /* WSR.SAR */ tcg_gen_andi_i32(tmp, cpu_R[AS], 63); gen_set_sra(dc, tmp, false); break; /* SSAI */ tcg_gen_movi_i32(tmp, constant); gen_gen_sra(dc, tmp, true); break; /* SLL */ if (dc->sar_m32_valid) { tcg_gen_sll_i32(cpu_R[AR], cpu_R[AS], dc->sar_m32); } else { /* your existing 64-bit shift emulation. */ } break; /* SRL */ if (dc->sar_5bit) { tcg_gen_srl_i32(cpu_R[AR], cpu_R[AS], cpu_SR[SAR]); } else { /* your existing 64-bit shift emulation. */ } A couple of points: The use of the local temp avoids problems with intervening insns that might generate branch opcodes. For the simplest cases, as with the case at the start of the message, we ought to be able to propagate the values into the TCG shift insn directly. Does that make sense? r~