Il 28/08/2014 19:30, Peter Maydell ha scritto: > On 28 August 2014 18:14, Paolo Bonzini <pbonz...@redhat.com> wrote: >> PowerPC TCG flushes the TLB on every IR/DR change, which basically >> means on every user<->kernel context switch. Use the 6-element >> TLB array as a cache, where each MMU index is mapped to a different >> state of the IR/DR/PR/HV bits. >> >> This brings the number of TLB flushes down from ~900000 to ~50000 >> for starting up the Debian installer, which is in line with x86 >> and gives a ~10% performance improvement. >> >> Signed-off-by: Paolo Bonzini <pbonz...@redhat.com> >> --- >> cputlb.c | 19 +++++++++++++++++ >> hw/ppc/spapr_hcall.c | 6 +++++- >> include/exec/exec-all.h | 5 +++++ >> target-ppc/cpu.h | 4 +++- >> target-ppc/excp_helper.c | 6 +----- >> target-ppc/helper_regs.h | 52 >> +++++++++++++++++++++++++++++++-------------- >> target-ppc/translate_init.c | 5 +++++ >> 7 files changed, 74 insertions(+), 23 deletions(-) >> >> diff --git a/cputlb.c b/cputlb.c >> index afd3705..17e1b03 100644 >> --- a/cputlb.c >> +++ b/cputlb.c >> @@ -67,6 +67,25 @@ void tlb_flush(CPUState *cpu, int flush_global) >> tlb_flush_count++; >> } >> >> +void tlb_flush_idx(CPUState *cpu, int mmu_idx) >> +{ >> + CPUArchState *env = cpu->env_ptr; >> + >> +#if defined(DEBUG_TLB) >> + printf("tlb_flush_idx %d:\n", mmu_idx); >> +#endif >> + /* must reset current TB so that interrupts cannot modify the >> + links while we are modifying them */ >> + cpu->current_tb = NULL; >> + >> + memset(env->tlb_table[mmu_idx], -1, sizeof(env->tlb_table[mmu_idx])); >> + memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache)); >> + >> + env->tlb_flush_addr = -1; >> + env->tlb_flush_mask = 0; > > Isn't this going to break huge page support? Consider > the case: > * set up huge pages in one TLB index (causing tlb_flush_addr > and tlb_flush_mask to be set to cover that range) > * switch to a different TLB index > * tlb_flush_idx() for that index (causing flush_addr/mask to > be reset) > * switch back to first TLB index > * do tlb_flush_page for an address inside the huge-page > region > > I think you need the flush addr/mask to be per-TLB-index > if you want this to work.
Yes, you're right. > Personally I would put the "implement new feature in core > code" in a separate patch from "use new feature in PPC code". This too, of course. The patches aren't quite ready, I wanted to post early because the speedups are very appealing to me. > Does PPC hardware do lots of TLB flushes on user-kernel > transitions, or does it have some sort of info in the TLB > entry about whether it should match or not? The IR and DR bits simply disable paging for respectively instructions and data. I suppose real hardware simply does not use the TLB when paging is disabled. IIRC each user->kernel transition disables paging, and then the kernel can re-enable it (optionally only on data). So the transition is user->kernel unpaged->kernel paged, and the kernel unpaged->kernel paged part is what triggers the TLB flush. (Something like this---Alex explained it to me a year ago when I asked why tlb_flush was always the top function in the profile of qemu-system-ppc*). Paolo