Paolo Bonzini <pbonz...@redhat.com> writes: > On 09/11/2016 15:57, Alex Bennée wrote: >> The one outstanding question is how to deal with the TLB flush >> semantics of the various guest architectures. Currently flushes to >> other vCPUs will happen at the end of their currently executing >> Translation Block which could mean the originating vCPU makes >> assumptions about flushes having been completed when they haven't. In >> practice this hasn't been a problem and I haven't been able to >> construct a test case so far that would fail in such a case. This is >> probably because most tear downs of the other vCPU TLBs tend to be >> done while the other vCPUs are not doing much. If anyone can come up >> with a test case that would fail if this assumption isn't met then >> please let me know. > > Have you tried implementing ARM's DMB semantics correctly?
I've implemented a stricter semantics with the proof of concept patch bellow. I'm not sure how to do it on the DMB instruction itself at the concept of a pending flush is a run-time rather than a translation-time concept. Is this the sort of state that could be pushed into the Translation flags? I suspect forcing generation of safe work synchronisation points for all DMBs would slow things down a lot. Usually the DMB's will be right after the flushes but not always so I doubt you can guarantee they will be in the same basic block. Thoughts? --8<---------------cut here---------------start------------->8--- target-arm: ensure tlbi_aa64_vae1is_write completes (POC) Previously flushes on other vCPUs would only get serviced when they exited their TranslationBlocks. While this isn't overly problematic it violates the semantics of TLB flush from the point of view of source vCPU. This proof-of-concept solves this by introducing a new tlb_flush_all_page_by_mmuidx which ensures all TLB flushes are completed by the time execution continues on the vCPU. It does this by creating a synchronisation point by scheduling its own flush via async_safe_work and exiting the execution loop. Once the safe work has executed all TLBs will have been updated. 4 files changed, 53 insertions(+), 9 deletions(-) cputlb.c | 33 +++++++++++++++++++++++++++++++++ include/exec/exec-all.h | 11 +++++++++++ target-arm/helper.c | 17 ++++++++--------- target-arm/translate-a64.c | 1 + modified cputlb.c @@ -354,6 +354,39 @@ void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong addr, ...) } } +/* This function affects all vCPUs are will ensure all work is + * complete by the time the loop restarts + */ +void tlb_flush_all_page_by_mmuidx(CPUState *src_cpu, target_ulong addr, ...) +{ + unsigned long mmu_idx_bitmap; + target_ulong addr_and_mmu_idx; + va_list argp; + CPUState *other_cs; + + va_start(argp, addr); + mmu_idx_bitmap = make_mmu_index_bitmap(argp); + va_end(argp); + + tlb_debug("addr: "TARGET_FMT_lx" mmu_idx:%lx\n", addr, mmu_idx_bitmap); + + /* This should already be page aligned */ + addr_and_mmu_idx = addr & TARGET_PAGE_MASK; + addr_and_mmu_idx |= mmu_idx_bitmap; + + CPU_FOREACH(other_cs) { + if (other_cs != src_cpu) { + async_run_on_cpu(other_cs, tlb_check_page_and_flush_by_mmuidx_async_work, + RUN_ON_CPU_TARGET_PTR(addr_and_mmu_idx)); + } else { + async_safe_run_on_cpu(other_cs, tlb_check_page_and_flush_by_mmuidx_async_work, + RUN_ON_CPU_TARGET_PTR(addr_and_mmu_idx)); + } + } + + cpu_loop_exit(src_cpu); +} + void tlb_flush_page_all(target_ulong addr) { CPUState *cpu; modified include/exec/exec-all.h @@ -115,6 +115,17 @@ void tlb_flush(CPUState *cpu, int flush_global); */ void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong addr, ...); /** + * tlb_flush_all_page_by_mmuidx: + * @cpu: Originating CPU of the flush + * @addr: virtual address of page to be flushed + * @...: list of MMU indexes to flush, terminated by a negative value + * + * Flush one page from the TLB of all CPUs, for the specified + * MMU indexes. This function does not return, the run loop will exit + * and restart once the flush is completed. + */ +void tlb_flush_all_page_by_mmuidx(CPUState *cpu, target_ulong addr, ...); +/** * tlb_flush_by_mmuidx: * @cpu: CPU whose TLB should be flushed * @...: list of MMU indexes to flush, terminated by a negative value modified target-arm/helper.c @@ -3047,20 +3047,19 @@ static void tlbi_aa64_vae3_write(CPUARMState *env, const ARMCPRegInfo *ri, --8<---------------cut here---------------end--------------->8--- static void tlbi_aa64_vae1is_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value) { + ARMCPU *cpu = arm_env_get_cpu(env); + CPUState *cs = CPU(cpu); bool sec = arm_is_secure_below_el3(env); - CPUState *other_cs; uint64_t pageaddr = sextract64(value << 12, 0, 56); fprintf(stderr,"%s: dbg\n", __func__); - CPU_FOREACH(other_cs) { - if (sec) { - tlb_flush_page_by_mmuidx(other_cs, pageaddr, ARMMMUIdx_S1SE1, - ARMMMUIdx_S1SE0, -1); - } else { - tlb_flush_page_by_mmuidx(other_cs, pageaddr, ARMMMUIdx_S12NSE1, - ARMMMUIdx_S12NSE0, -1); - } + if (sec) { + tlb_flush_all_page_by_mmuidx(cs, pageaddr, ARMMMUIdx_S1SE1, + ARMMMUIdx_S1SE0, -1); + } else { + tlb_flush_all_page_by_mmuidx(cs, pageaddr, ARMMMUIdx_S12NSE1, + ARMMMUIdx_S12NSE0, -1); } } modified target-arm/translate-a64.c @@ -1588,6 +1588,7 @@ static void handle_sys(DisasContext *s, uint32_t insn, bool isread, } else if (ri->writefn) { TCGv_ptr tmpptr; tmpptr = tcg_const_ptr(ri); + gen_a64_set_pc_im(s->pc); gen_helper_set_cp_reg64(cpu_env, tmpptr, tcg_rt); tcg_temp_free_ptr(tmpptr); } else { -- Alex Bennée