Richard Henderson <r...@twiddle.net> writes: > I've been looking at this problem off and on for the last week or so, > prompted by the sparc performance work. Although I havn't been able > to get a proper sparc64 guest install working, I see the exact same > problem with a mips guest. > <snip> > In the past we've talked about getting rid of retranslation entirely. > It's clever, but it certainly has its share of problems. I gave it > a go this weekend. > <snip> > Thoughts on the approach?
I've only had a quick glance so far but I'm fairly familiar with the concept from a previous life. I'll aim to do a full review later once I've gotten through my MTTCG review backlog. Anyway some quick points: * You can save data by only marking faulting instructions Assuming that all asynchronous instructions trigger at the end/prologue of basic blocks you only actually need to record the address of potentially faulting instructions. In fact only a few backend instructions will actually synchronously fault. Of course this does have the downside of having to mark all those instructions in the front end. * This method can also be used for additional rectification data AIUI we currently ensure all load/stores are barriers and ensure the CPU register file is updated before the occur. However if you wanted to you could drop that requirement and mark the target-host register pair and only fish it out when required on a fault. * Test suites are essential if your going to get clever Last time I went through this I built a SPARC test suite to cover all faulting instructions in all the various addressing modes. It flushed out a lot of bugs. I appreciate the QEMU's aims may be a bit less demanding and not need to be fully complete and fix up problems as we hit them in the field. However consider at least a framework of a testcase for checking PC rectification as it will help in validating those fixes. * Delay slot/nPCs are a pain Faults in delay slots are a pain to get right although maybe QEMUs architecture makes it a little easier to do. Fortunately for me I no longer have to worry too hard about these architectures, good luck ;-) Anyway anything that gets rid of the re-translation cost I'm broadly supportive of. I shall review the code later! > > > r~ > > > Richard Henderson (20): > tcg: Rename debug_insn_start to insn_start > target-*: Unconditionally emit tcg_gen_insn_start > tcg: Allow extra data to be attached to insn_start > target-arm: Add condexec state to insn_start > target-i386: Add cc_op state to insn_start > target-mips: Add delayed branch state to insn_start > target-s390x: Add cc_op state to insn_start > target-sh4: Add flags state to insn_start > target-cris: Mirror gen_opc_pc into insn_start > target-sparc: Tidy gen_branch_a interface > target-sparc: Split out gen_branch_n > target-sparc: Remove gen_opc_jump_pc > target-sparc: Add npc state to insn_start > tcg: Merge cpu_gen_code into tb_gen_code > target-*: Drop cpu_gen_code define > tcg: Add TCG_MAX_INSNS > tcg: Pass data argument to restore_state_to_opc > tcg: Save insn data and use it in cpu_restore_state_from_tb > tcg: Remove gen_intermediate_code_pc > tcg: Remove tcg_gen_code_search_pc > > include/exec/exec-all.h | 6 +- > target-alpha/cpu.h | 1 - > target-alpha/translate.c | 55 +++------- > target-arm/cpu.h | 2 +- > target-arm/translate-a64.c | 39 ++----- > target-arm/translate.c | 75 ++++--------- > target-arm/translate.h | 8 +- > target-cris/cpu.h | 1 - > target-cris/translate.c | 64 +++--------- > target-cris/translate_v10.c | 3 - > target-i386/cpu.h | 2 +- > target-i386/translate.c | 86 ++++----------- > target-lm32/cpu.h | 1 - > target-lm32/translate.c | 55 ++-------- > target-m68k/cpu.h | 1 - > target-m68k/translate.c | 64 +++--------- > target-microblaze/cpu.h | 1 - > target-microblaze/translate.c | 56 +++------- > target-mips/cpu.h | 2 +- > target-mips/translate.c | 73 ++++--------- > target-moxie/cpu.h | 1 - > target-moxie/translate.c | 65 ++++-------- > target-openrisc/cpu.h | 1 - > target-openrisc/translate.c | 54 ++-------- > target-ppc/cpu.h | 1 - > target-ppc/translate.c | 56 +++------- > target-s390x/cpu.h | 2 +- > target-s390x/translate.c | 61 +++-------- > target-sh4/cpu.h | 2 +- > target-sh4/translate.c | 71 ++++--------- > target-sparc/cpu.h | 2 +- > target-sparc/translate.c | 189 ++++++++++++++------------------- > target-tricore/translate.c | 53 ++++------ > target-unicore32/translate.c | 57 +++------- > target-xtensa/cpu.h | 1 - > target-xtensa/translate.c | 52 ++------- > tcg/tcg-op.h | 52 +++++++-- > tcg/tcg-opc.h | 4 +- > tcg/tcg.c | 96 ++++++++--------- > tcg/tcg.h | 14 ++- > tci.c | 9 -- > translate-all.c | 237 > ++++++++++++++++++++++++------------------ > 42 files changed, 578 insertions(+), 1097 deletions(-) -- Alex Bennée