On Mon, Apr 25, 2011 at 10:29 PM, Aurelien Jarno <aurel...@aurel32.net> wrote: > On Fri, Apr 22, 2011 at 06:14:06PM +0400, Igor Kovalenko wrote: >> On Fri, Apr 22, 2011 at 2:39 AM, Laurent Desnogues >> <laurent.desnog...@gmail.com> wrote: >> > On Thu, Apr 21, 2011 at 9:45 PM, Igor Kovalenko >> > <igor.v.kovale...@gmail.com> wrote: >> >> On Thu, Apr 21, 2011 at 7:44 PM, Laurent Desnogues >> >> <laurent.desnog...@gmail.com> wrote: >> >>> On Thu, Apr 21, 2011 at 4:57 PM, Artyom Tarasenko <atar4q...@gmail.com> >> >>> wrote: >> >>>> On Tue, Apr 12, 2011 at 4:14 AM, Igor Kovalenko >> >>>> <igor.v.kovale...@gmail.com> wrote: >> >>>>>>> Do you have public test case? >> >>>>>>> It is possible to code this delay slot write test but real issue may >> >>>>>>> be corruption elsewhere. >> >>>> >> >>>> The test case is trivial: it's just the two instructions, branch and >> >>>> wrpr. >> >>>> >> >>>>> In theory there could be multiple issues including compiler induced >> >>>>> ones. >> >>>>> I'd prefer to see some kind of reproducible testcase. >> >>>> >> >>>> Ok, attached a 40 byte long test (the first 32 bytes are not used and >> >>>> needed only because the bios entry point is 0x20). >> >>>> >> >>>> $ git pull && make && sparc64-softmmu/qemu-system-sparc64 -bios >> >>>> test-wrpr.bin -nographic >> >>>> Already up-to-date. >> >>>> make[1]: Nothing to be done for `all'. >> >>>> /mnt/terra/projects/vanilla/qemu/tcg/tcg.c:1892: tcg fatal error >> >>>> Aborted >> >>> >> >>> The problem seems to be that wrpr is using a non-local >> >>> TCG tmp (cpu_tmp0). >> >> >> >> Just tried the test case with write to %pil - seems like write itself is >> >> OK. >> >> The issue appears to be with save_state() call since adding save_state >> >> to %pil case provokes the same tcg abort. >> > >> > The problem is that cpu_tmp0, not being a local tmp, doesn't >> > need to be saved across helper calls. This results in the >> > TCG "optimizer" getting rid of it even though it's later used. >> > Look at the log and you'll see what I mean :-) >> >> I'm not very comfortable with tcg yet. Would it be possible to teach >> optimizer working with delay slots? Or do I look in the wrong place. >> > > The problem is not on the TCG side, but on the target-sparc/translate.c > side: > > | case 0x32: /* wrwim, V9 wrpr */ > | { > | if (!supervisor(dc)) > | goto priv_insn; > | tcg_gen_xor_tl(cpu_tmp0, cpu_src1, cpu_src2); > | #ifdef TARGET_SPARC64 > > Here cpu_tmp0 is loaded. cpu_tmp0 is a TCG temp, which means it is not > saved across TCG branches. > > [...] > > | case 6: // pstate > | save_state(dc, cpu_cond); > | gen_helper_wrpstate(cpu_tmp0); > | dc->npc = DYNAMIC_PC; > | break; > > save_state() calls save_npc(), which in turns might call > gen_generic_branch():
Hmm. This is not the only instruction using save_state() and cpu_tmp0. At least ldd is another example. > | static inline void gen_generic_branch(target_ulong npc1, target_ulong npc2, > | TCGv r_cond) > | { > | int l1, l2; > | > | l1 = gen_new_label(); > | l2 = gen_new_label(); > | > | tcg_gen_brcondi_tl(TCG_COND_EQ, r_cond, 0, l1); > | > | tcg_gen_movi_tl(cpu_npc, npc1); > | tcg_gen_br(l2); > | > | gen_set_label(l1); > | tcg_gen_movi_tl(cpu_npc, npc2); > | gen_set_label(l2); > | } > > And here is the TCG branch, which drop the TCG temp cpu_temp0. > > The solution is either to rewrite gen_generic_branch() without TCG > branches, or to use a TCG temp local instead of a TCG temp. You mean something like case 6: // pstate { TCGv r_temp; r_temp = tcg_temp_new(); tcg_gen_mov_tl(r_temp, cpu_tmp0); save_state(dc, cpu_cond); gen_helper_wrpstate(r_temp); tcg_temp_free(r_temp); dc->npc = DYNAMIC_PC; } break; ? This fails with the same error message. Artyom