ppc's broadcast tlb flushe must ensure all vCPUs have invalidated their TLBs when the tlbie sequence completes. This is not true in the current TCG implementation, due to async_run_on_cpu() returning before the remote CPU runs the job.
Fixing ppc requires moving to async_safe_run_on_cpu(), however this API does not guarantee that either, and actually changing to the tlb_flush_*_all_cpus_synced() variants introduces another race which is that the flush is not even guaraneed to complete on the local CPU. To ensure that it is, the tlbie has to be made the last instruction in the TB. Fixing ppc removes the last caller of the non-synced TLB flush variants, we can remove some dead code too. For others - at least arm, riscv, and s390x all use the tlb_flush_*_all_cpus_synced() calls AFAIKS without ending the TB. But it's possible I've missed where they do, or they have other issues that mean it's not required for the flush to take effect until some later operation which does end the TB. Maybe there is no problem, but it might be worth looking at. To reproduce, I have a kvm-unit-tests case for ppc but should be quite easy to port to other archs. You just need to be careful for the local CPU test case that your tlbi instruction is in the same TB as the subsequent load/store that is to incorrectly use a stale translation. https://gitlab.com/npiggin/kvm-unit-tests/-/tree/powerpc The test 'powerpc/mmu.elf tlbie:tlbi-other-cpu' breaks with upstream QEMU. If we use tlb_flush_*_all_cpus_synced() like other arch, then 'powerpc/mmu.elf tlbie:tlbi-this-cpu' also breaks. Since v1 I understood the full problem and fix, and fixed the fix. Thanks, Nick Nicholas Piggin (3): target/ppc: Fix broadcast tlbie synchronisation tcg/cputlb: Remove non-synced variants of global TLB flushes tcg/cputlb: remove other-cpu capability from TLB flushing docs/devel/multi-thread-tcg.rst | 13 +- include/exec/exec-all.h | 97 ++----------- accel/tcg/cputlb.c | 145 ++----------------- target/ppc/helper_regs.c | 2 +- target/ppc/mmu_helper.c | 2 +- target/ppc/translate.c | 7 + target/ppc/translate/storage-ctrl-impl.c.inc | 7 + 7 files changed, 44 insertions(+), 229 deletions(-) -- 2.43.0