On 04/06/2015 07:03, Richard Henderson wrote: >> + tcg_gen_add2_i32(t1, t2, REG(B11_8), t0, REG(B7_4), t0); >> + tcg_gen_add2_i32(REG(B11_8), cpu_sr_t, t1, t2, cpu_sr_t, >> t0); > > Swap these two adds and you don't need t2. You can consume sr_t > immediately and start producing it in the same go.
Could TCG do some kind of intra-basic-block live range splitting? In this case, the new sr_t could be allocated to a different register than the old one, saving one instruction on 2-address targets. The pseudocode below uses "dest, src" operand order: // add2(t1, cpu_sr_t, cpu_sr_t, t0, REG(B7_4), t0) add sr_t_in, B7_4 // instead of mov t1, sr_t; add t1, B7_4 mov sr_t_out, 0 adc sr_t_out, 0 // cout(B7_r + sr_t_in) // add2(REG(B11_8), cpu_sr_t, t1, cpu_sr_t, REG(B11_8), t0) add B11_8, sr_t_in // B11_8 + B7_4 + sr_t_in adc sr_t_out, 0 // cout(B11_8 + B7_4 + sr_t_in) Paolo