>> To track immediate values written to SAR? You mean that there may be >> some performance difference of fixed size shift vs indirect shift and >> TCG is able to tell them apart? > > Well, not really fixed vs indirect, but if you know that the value > in the SAR register is in the right range, you can avoid using a > 64-bit shift. > > For instance, > > SSL ar2 > SLL ar0, ar1 > > could be implemented with > > tcg_gen_sll_i32(ar0, ar1, ar2); > > assuming we have enough context. > > Let us decompose the SAR register into two parts, storing both the > true value, and 32-value. > > struct DisasContext { > // Current Stuff > // ... > > // When valid, holds 32-SAR. > TCGv sar_m32; > bool sar_m32_alloc; > bool sar_m32_valid; > bool sar_5bit; > }; > > At the beginning of the TB: > > TCGV_UNUSED_I32(dc->sar_m32); > dc->sar_m32_alloc = false; > dc->sar_m32_valid = false; > dc->sar_5bit = false; > > > > static void gen_set_sra_m32(DisasContext *dc, TCGv val) > { > if (!dc->sar_m32_alloc) { > dc->sar_m32_alloc = true; > dc->sar_m32 = tcg_temp_local_new_i32(); > } > dc->sar_m32_valid = true; > > /* Clear 5 bit because the SAR value could be 32. */ > dc->sar_5bit = false; > > tcg_gen_movi_i32(cpu_SR[SAR], 32); > tcg_gen_sub_i32(cpu_SR[SAR], cpu_SR[SAR], val); > tcg_gen_mov_i32(dc->sar_m32, val); > } > > static void gen_set_sra(DisasContext *dc, TCGv val, bool is_5bit) > { > if (dc->sar_m32_alloc && dc->sar_m32_valid) { > tcg_gen_discard_i32(dc->sar_m32); > } > dc->sar_m32_valid = false; > dc->sar_5bit = is_5bit; > > tcg_gen_mov_i32(cpu_SR[SAR], val); > } > > /* SSL */ > tcg_gen_andi_i32(tmp, cpu_R[AS], 31); > gen_set_sra_m32(dc, tmp); > break; > > /* SRL */ > tcg_gen_andi_i32(tmp, cpu_R[AS], 31); > gen_set_sra(dc, tmp, true); > break; > > /* WSR.SAR */ > tcg_gen_andi_i32(tmp, cpu_R[AS], 63); > gen_set_sra(dc, tmp, false); > break; > > /* SSAI */ > tcg_gen_movi_i32(tmp, constant); > gen_gen_sra(dc, tmp, true); > break; > > /* SLL */ > if (dc->sar_m32_valid) { > tcg_gen_sll_i32(cpu_R[AR], cpu_R[AS], dc->sar_m32); > } else { > /* your existing 64-bit shift emulation. */ > } > break; > > /* SRL */ > if (dc->sar_5bit) { > tcg_gen_srl_i32(cpu_R[AR], cpu_R[AS], cpu_SR[SAR]); > } else { > /* your existing 64-bit shift emulation. */ > } > > > A couple of points: The use of the local temp avoids problems with > intervening insns that might generate branch opcodes. For the > simplest cases, as with the case at the start of the message, we > ought to be able to propagate the values into the TCG shift insn > directly. > > Does that make sense?
Yes it does. Thanks for the good explanation. I tried to keep it all as simple as possible to have a working prototype qickly. Now that it works optimizations should be no problem. Thanks. -- Max