This started as an "easy" fix for RF handling in string instructions. I then realized how broken repz_opt is (patch 5) in that it was optimizing for the wrong case; and that redoing the optimization would make the RF handling basically free.
On a microbenchmark running x86-on-x86 user-mode emulation, stos and movs execute about 40% less instruction and about 60% less branches. Performance is very variable, because it is limited by memory bandwidth and because the out-of-order processor does a great job of scheduling all the useless instructions executed by the older code; but the microbenchmark results seem to improve by 10-15%. Paolo Paolo Bonzini (13): target/i386: inline gen_jcc into sole caller target/i386: remove trailing 1 from gen_{j,cmov,set}cc1 target/i386: unify REP and REPZ/REPNZ generation target/i386: unify choice between single and repeated string instructions target/i386: reorganize ops emitted by do_gen_rep, drop repz_opt target/i386: tcg: move gen_set/reset_* earlier in the file target/i386: fix RF handling for string instructions target/i386: make cc_op handling more explicit for repeated string instructions. target/i386: do not use gen_op_jz_ecx for repeated string operations target/i386: optimize CX handling in repeated string operations target/i386: execute multiple REP/REPZ iterations without leaving TB target/i386: pull computation of string update value out of loop target/i386: avoid using s->tmp0 for add to implicit registers target/i386/tcg/translate.c | 342 +++++++++++++++++++++--------------- target/i386/tcg/emit.c.inc | 56 ++---- 2 files changed, 219 insertions(+), 179 deletions(-) -- 2.47.1