On 04/06/2015 07:03, Richard Henderson wrote:
>> +            tcg_gen_add2_i32(t1, t2, REG(B11_8), t0, REG(B7_4), t0);
>> +            tcg_gen_add2_i32(REG(B11_8), cpu_sr_t, t1, t2, cpu_sr_t,
>> t0);
> 
> Swap these two adds and you don't need t2.  You can consume sr_t
> immediately and start producing it in the same go.

Could TCG do some kind of intra-basic-block live range splitting?  In
this case, the new sr_t could be allocated to a different register than
the old one, saving one instruction on 2-address targets.

The pseudocode below uses "dest, src" operand order:

   // add2(t1, cpu_sr_t, cpu_sr_t, t0, REG(B7_4), t0)
   add sr_t_in, B7_4    // instead of mov t1, sr_t; add t1, B7_4
   mov sr_t_out, 0
   adc sr_t_out, 0      // cout(B7_r + sr_t_in)

   // add2(REG(B11_8), cpu_sr_t, t1, cpu_sr_t, REG(B11_8), t0)
   add B11_8, sr_t_in   // B11_8 + B7_4 + sr_t_in
   adc sr_t_out, 0      // cout(B11_8 + B7_4 + sr_t_in)

Paolo

Reply via email to