Hi Richard, On Wed, Aug 10, 2016 at 6:44 AM, Richard Henderson <r...@twiddle.net> wrote: > On 08/10/2016 04:42 AM, Pranith Kumar wrote: >> >> +/* Eliminate duplicate and unnecessary fence instructions */ >> +void tcg_optimize_mb(TCGContext *s) >> +{ >> + int oi, oi_next; >> + TCGArg prev_op_mb = -1; >> + TCGOp *prev_op = NULL; >> + >> + for (oi = s->gen_op_buf[0].next; oi != 0; oi = oi_next) { >> + TCGOp *op = &s->gen_op_buf[oi]; >> + TCGArg *args = &s->gen_opparam_buf[op->args]; >> + TCGOpcode opc = op->opc; >> + >> + switch (opc) { >> + case INDEX_op_mb: >> + { >> + TCGBar curr_mb_type = args[0] & 0xF0; >> + TCGBar prev_mb_type = prev_op_mb & 0xF0; >> + >> + if (curr_mb_type == prev_mb_type || >> + (curr_mb_type == TCG_BAR_STRL && prev_mb_type == >> TCG_BAR_SC)) { > > > As I said before, I don't think you should be doing any of this if > prev_op_mb == -1. I think that's the first thing you should test within > INDEX_op_mb.
If prev_op_mb is -1, none of the conditions evaluate to true and we will fall through. Did you want me to check this as an optimization or am I missing your point? > >> + /* Remove the current weaker barrier op. The previous >> + * barrier is stronger and sufficient. >> + * mb; strl => mb; st >> + */ >> + tcg_op_remove(s, op); >> + op = prev_op; > > > As I said before, surely you have to merge the low 4 bits as well. Since we are removing the weaker barrier and leaving the stronger barrier, I think just updating args[0] to have the stronger barrier semantics should be enough. tcg_op_remove(s, op); op = prev_op; args[0] = prev_op_mb; break; Is there something I am missing when you say merge? > >> + case INDEX_op_ld8u_i32: >> + case INDEX_op_ld8u_i64: >> + case INDEX_op_ld8s_i32: >> + case INDEX_op_ld8s_i64: >> + case INDEX_op_ld16u_i32: >> + case INDEX_op_ld16u_i64: >> + case INDEX_op_ld16s_i32: >> + case INDEX_op_ld16s_i64: >> + case INDEX_op_ld_i32: >> + case INDEX_op_ld32u_i64: >> + case INDEX_op_ld32s_i64: >> + case INDEX_op_ld_i64: >> + case INDEX_op_st8_i32: >> + case INDEX_op_st8_i64: >> + case INDEX_op_st16_i32: >> + case INDEX_op_st16_i64: >> + case INDEX_op_st_i32: >> + case INDEX_op_st32_i64: >> + case INDEX_op_st_i64: > > > None of these are guest memory operations. These are host access to the ENV > structure, primarily for cpu registers that aren't common enough to warrant > a TCG temporary. > > You wanted to be checking for > > INDEX_op_qemu_ld_i32 > INDEX_op_qemu_ld_i64 > INDEX_op_qemu_st_i32 > INDEX_op_qemu_st_i64 > > which are the guest memory operations. > Yes, I got confused. I will fix this. >> + case INDEX_op_call: >> + prev_op_mb = -1; >> + prev_op = NULL; >> + break; > > > Probably you could skip the calls to "pure" helpers. I.e. those with > TCG_CALL_NO_RWG_SE. But it's probably not that important and we can leave > it for another day. > > > r~ -- Pranith