On 5/3/10, Igor Kovalenko <igor.v.kovale...@gmail.com> wrote: > Hi! > > There is an issue with lazy conditional codes evaluation where > we return from trap handler with mismatching conditionals. > > I seldom reproduce it here when dragging qemu window while > machine is working through silo initialization. I use gentoo minimal cd > install-sparc64-minimal-20100322.iso but I think anything with silo boot > would experience the same. Once in a while it would report crc error, > unable to open cd partition or it would fail to decompress image.
I think I've also seen this. > Pattern that fails appears to require a sequence of compare insn > possibly followed by a few instructions which do not touch conditionals, > then conditional branch insn. If it happens that we trap while processing > conditional branch insn so it is restarted after return from trap then > seldom conditional codes are calculated incorrectly. > > I cannot point to exact cause but it appears that after trap return > we may have CC_OP and CC_SRC* mismatch somewhere, > since adding more cond evaluation flushes over the code helps. > > We already tried doing flush more frequently and it is still not > complete, so the question is how to finally do this once and right :) > > Obviously I do not get the design of lazy evaluation right, but > the following list appears to be good start. Plan is to prepare > a change to qemu and find a way to test it. > > 1. Since SPARC* is a RISC CPU it seems to be not profitable to > use DisasContext->cc_op to predict if flags should be not evaluated > due to overriding insn. Instead we can drop cc_op from disassembler > context and simplify code to only use cc_op from env. Not currently, but in the future we may use that to do even lazier flags computation. For example the sequence 'cmp x, y; bne target' could be much more optimal by changing the branch to do the comparison. Here's an old unfinished patch to do some of this. > Another point is that we always write to env->cc_op when > translating *cc insns > This should solve any issue with dc->cc_op prediction going > out of sync with env->cc_op and cpu_cc_src* I think this is what is happening now. > 2. We must flush lazy evaluation back to CC_OP_FLAGS in a few cases when > a. conditional code is required by insn (like addc, cond branch etc.) > - here we can optimize by evaluating specific bits (carry?) > - not sure if it works in case we have two cond consuming insns, > where first needs carry another needs the rest of flags Here's another patch to optimize C flag handling. It doesn't pass my tests though. > b. CCR is read by rdccr (helper_rdccr) > - have to compute all flags > c. trap occurs and we prepare trap level context (saving pstate) > - have to compute all flags > d. control goes out of tcg runtime (so gdbstub reads correct value from > env) > - have to compute all flags Fully agree.
From b0863c213ce487e9c1034674668d1b64a43b7266 Mon Sep 17 00:00:00 2001 From: Blue Swirl <blauwir...@gmail.com> Date: Mon, 3 May 2010 19:11:37 +0000 Subject: [PATCH] Convert C flag input BROKEN Signed-off-by: Blue Swirl <blauwir...@gmail.com> --- target-sparc/translate.c | 24 ++++++++---------------- 1 files changed, 8 insertions(+), 16 deletions(-) diff --git a/target-sparc/translate.c b/target-sparc/translate.c index be2a116..94c343d 100644 --- a/target-sparc/translate.c +++ b/target-sparc/translate.c @@ -336,7 +336,7 @@ static inline void gen_op_addxi_cc(TCGv dst, TCGv src1, target_long src2) { tcg_gen_mov_tl(cpu_cc_src, src1); tcg_gen_movi_tl(cpu_cc_src2, src2); - gen_mov_reg_C(cpu_tmp0, cpu_psr); + gen_helper_compute_C_icc(cpu_tmp0); tcg_gen_add_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0); tcg_gen_addi_tl(cpu_cc_dst, cpu_cc_dst, src2); tcg_gen_mov_tl(dst, cpu_cc_dst); @@ -346,7 +346,7 @@ static inline void gen_op_addx_cc(TCGv dst, TCGv src1, TCGv src2) { tcg_gen_mov_tl(cpu_cc_src, src1); tcg_gen_mov_tl(cpu_cc_src2, src2); - gen_mov_reg_C(cpu_tmp0, cpu_psr); + gen_helper_compute_C_icc(cpu_tmp0); tcg_gen_add_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0); tcg_gen_add_tl(cpu_cc_dst, cpu_cc_dst, cpu_cc_src2); tcg_gen_mov_tl(dst, cpu_cc_dst); @@ -419,7 +419,7 @@ static inline void gen_op_subxi_cc(TCGv dst, TCGv src1, target_long src2) { tcg_gen_mov_tl(cpu_cc_src, src1); tcg_gen_movi_tl(cpu_cc_src2, src2); - gen_mov_reg_C(cpu_tmp0, cpu_psr); + gen_helper_compute_C_icc(cpu_tmp0); tcg_gen_sub_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0); tcg_gen_subi_tl(cpu_cc_dst, cpu_cc_dst, src2); tcg_gen_mov_tl(dst, cpu_cc_dst); @@ -429,7 +429,7 @@ static inline void gen_op_subx_cc(TCGv dst, TCGv src1, TCGv src2) { tcg_gen_mov_tl(cpu_cc_src, src1); tcg_gen_mov_tl(cpu_cc_src2, src2); - gen_mov_reg_C(cpu_tmp0, cpu_psr); + gen_helper_compute_C_icc(cpu_tmp0); tcg_gen_sub_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0); tcg_gen_sub_tl(cpu_cc_dst, cpu_cc_dst, cpu_cc_src2); tcg_gen_mov_tl(dst, cpu_cc_dst); @@ -2953,25 +2953,21 @@ static void disas_sparc_insn(DisasContext * dc) if (IS_IMM) { simm = GET_FIELDs(insn, 19, 31); if (xop & 0x10) { - gen_helper_compute_psr(); gen_op_addxi_cc(cpu_dst, cpu_src1, simm); tcg_gen_movi_i32(cpu_cc_op, CC_OP_ADDX); dc->cc_op = CC_OP_ADDX; } else { - gen_helper_compute_psr(); - gen_mov_reg_C(cpu_tmp0, cpu_psr); + gen_helper_compute_C_icc(cpu_tmp0); tcg_gen_addi_tl(cpu_tmp0, cpu_tmp0, simm); tcg_gen_add_tl(cpu_dst, cpu_src1, cpu_tmp0); } } else { if (xop & 0x10) { - gen_helper_compute_psr(); gen_op_addx_cc(cpu_dst, cpu_src1, cpu_src2); tcg_gen_movi_i32(cpu_cc_op, CC_OP_ADDX); dc->cc_op = CC_OP_ADDX; } else { - gen_helper_compute_psr(); - gen_mov_reg_C(cpu_tmp0, cpu_psr); + gen_helper_compute_C_icc(cpu_tmp0); tcg_gen_add_tl(cpu_tmp0, cpu_src2, cpu_tmp0); tcg_gen_add_tl(cpu_dst, cpu_src1, cpu_tmp0); } @@ -3009,25 +3005,21 @@ static void disas_sparc_insn(DisasContext * dc) if (IS_IMM) { simm = GET_FIELDs(insn, 19, 31); if (xop & 0x10) { - gen_helper_compute_psr(); gen_op_subxi_cc(cpu_dst, cpu_src1, simm); tcg_gen_movi_i32(cpu_cc_op, CC_OP_SUBX); dc->cc_op = CC_OP_SUBX; } else { - gen_helper_compute_psr(); - gen_mov_reg_C(cpu_tmp0, cpu_psr); + gen_helper_compute_C_icc(cpu_tmp0); tcg_gen_addi_tl(cpu_tmp0, cpu_tmp0, simm); tcg_gen_sub_tl(cpu_dst, cpu_src1, cpu_tmp0); } } else { if (xop & 0x10) { - gen_helper_compute_psr(); gen_op_subx_cc(cpu_dst, cpu_src1, cpu_src2); tcg_gen_movi_i32(cpu_cc_op, CC_OP_SUBX); dc->cc_op = CC_OP_SUBX; } else { - gen_helper_compute_psr(); - gen_mov_reg_C(cpu_tmp0, cpu_psr); + gen_helper_compute_C_icc(cpu_tmp0); tcg_gen_add_tl(cpu_tmp0, cpu_src2, cpu_tmp0); tcg_gen_sub_tl(cpu_dst, cpu_src1, cpu_tmp0); } -- 1.5.6.5
From 93cce43be043ca25770165b8c06546eafc320716 Mon Sep 17 00:00:00 2001 From: Blue Swirl <blauwir...@gmail.com> Date: Mon, 3 May 2010 19:21:59 +0000 Subject: [PATCH] Branch optimization BROKEN Signed-off-by: Blue Swirl <blauwir...@gmail.com> --- target-sparc/translate.c | 108 +++++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 106 insertions(+), 2 deletions(-) diff --git a/target-sparc/translate.c b/target-sparc/translate.c index 94c343d..57bda12 100644 --- a/target-sparc/translate.c +++ b/target-sparc/translate.c @@ -1115,6 +1115,104 @@ static inline void gen_cond_reg(TCGv r_dst, int cond, TCGv r_src) } #endif +// Inverted logic +static const int gen_tcg_cond[16] = { + -1, + TCG_COND_NE, + TCG_COND_GT, + TCG_COND_GE, + TCG_COND_GTU, + TCG_COND_GEU, + -1, + -1, + -1, + TCG_COND_EQ, + TCG_COND_LE, + TCG_COND_LT, + TCG_COND_LEU, + TCG_COND_LTU, + -1, + -1, +}; + +/* generate a conditional jump to label 'l1' according to jump opcode + value 'b'. In the fast case, T0 is guaranted not to be used. */ +static inline void gen_brcond(DisasContext *dc, int cond, int l1, int cc, TCGv r_cond) +{ + //printf("gen_brcond: cc_op %d\n", dc->cc_op); + switch (dc->cc_op) { + /* we optimize the cmp/br case */ + case CC_OP_SUB: + // Inverted logic + switch (cond) { + case 0x0: // n + tcg_gen_br(l1); + break; + case 0x1: // e + if (cc == 1) { + tcg_gen_brcondi_i64(TCG_COND_NE, cpu_cc_dst, 0, l1); + } else { + tcg_gen_brcondi_i32(TCG_COND_NE, cpu_cc_dst, 0, l1); + } + break; + case 0x2: // le + case 0x3: // l + case 0x4: // leu + case 0x5: // cs/lu + case 0xa: // g + case 0xb: // ge + case 0xc: // gu + case 0xd: // cc/geu + if (cc == 1) { + tcg_gen_brcondi_i64(gen_tcg_cond[cond], cpu_cc_src, cpu_cc_src2, l1); + } else { + tcg_gen_brcondi_i32(gen_tcg_cond[cond], cpu_cc_src, cpu_cc_src2, l1); + } + break; + case 0x6: // neg + if (cc == 1) { + tcg_gen_brcondi_i64(TCG_COND_GE, cpu_cc_dst, 0, l1); + } else { + tcg_gen_brcondi_i32(TCG_COND_GE, cpu_cc_dst, 0, l1); + } + break; + case 0x7: // vs + gen_helper_compute_psr(); + dc->cc_op = CC_OP_FLAGS; + gen_op_eval_bvs(cpu_cc_dst, cpu_cc_src); + break; + case 0x8: // a + // nop + break; + case 0x9: // ne + if (cc == 1) { + tcg_gen_brcondi_i64(TCG_COND_EQ, cpu_cc_dst, 0, l1); + } else { + tcg_gen_brcondi_i32(TCG_COND_EQ, cpu_cc_dst, 0, l1); + } + break; + case 0xe: // pos + if (cc == 1) { + tcg_gen_brcondi_i64(TCG_COND_LT, cpu_cc_dst, 0, l1); + } else { + tcg_gen_brcondi_i32(TCG_COND_LT, cpu_cc_dst, 0, l1); + } + break; + case 0xf: // vc + gen_helper_compute_psr(); + dc->cc_op = CC_OP_FLAGS; + gen_op_eval_bvc(cpu_cc_dst, cpu_cc_src); + break; + } + break; + case CC_OP_FLAGS: + default: + gen_cond(r_cond, cc, cond, dc); + tcg_gen_brcondi_tl(TCG_COND_EQ, r_cond, 0, l1); + break; + } +} + /* XXX: potentially incorrect if dynamic npc */ static void do_branch(DisasContext *dc, int32_t offset, uint32_t insn, int cc, TCGv r_cond) @@ -1143,11 +1241,17 @@ static void do_branch(DisasContext *dc, int32_t offset, uint32_t insn, int cc, } } else { flush_cond(dc, r_cond); - gen_cond(r_cond, cc, cond, dc); if (a) { - gen_branch_a(dc, target, dc->npc, r_cond); + int l1 = gen_new_label(); + + gen_brcond(dc, cond, l1, cc, r_cond); + gen_goto_tb(dc, 0, dc->npc, target); + + gen_set_label(l1); + gen_goto_tb(dc, 1, dc->npc + 4, dc->npc + 8); dc->is_br = 1; } else { + gen_cond(r_cond, cc, cond, dc); dc->pc = dc->npc; dc->jump_pc[0] = target; dc->jump_pc[1] = dc->npc + 4; -- 1.5.6.5