The ordering in should_flush_tlb() is entirely non-obvious and is only correct because x86 is TSO. Clarify the situation by replacing two WRITE_ONCE()s with smp_store_release(), which on x86 is cosmetic.
Additionally, clarify the comment on should_flush_tlb(). Signed-off-by: Peter Zijlstra (Intel) <pet...@infradead.org> --- arch/x86/mm/tlb.c | 30 ++++++++++++++++++++++-------- 1 file changed, 22 insertions(+), 8 deletions(-) --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -910,8 +910,10 @@ void switch_mm_irqs_off(struct mm_struct * Indicate that CR3 is about to change. nmi_uaccess_okay() * and others are sensitive to the window where mm_cpumask(), * CR3 and cpu_tlbstate.loaded_mm are not all in sync. + * + * Also, see should_flush_tlb(). */ - WRITE_ONCE(this_tlbstate->loaded_mm, LOADED_MM_SWITCHING); + smp_store_release(&this_tlbstate->loaded_mm, LOADED_MM_SWITCHING); barrier(); /* Start receiving IPIs and then read tlb_gen (and LAM below) */ @@ -938,10 +940,11 @@ void switch_mm_irqs_off(struct mm_struct trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 0); } - /* Make sure we write CR3 before loaded_mm. */ - barrier(); - - WRITE_ONCE(this_tlbstate->loaded_mm, next); + /* + * Make sure we write CR3 before loaded_mm. + * See nmi_uaccess_okay() and should_flush_tlb(). + */ + smp_store_release(&this_tlbstate->loaded_mm, next); WRITE_ONCE(this_tlbstate->loaded_mm_asid, ns.asid); cpu_tlbstate_update_lam(new_lam, mm_untag_mask(next)); @@ -1280,9 +1283,20 @@ static bool should_flush_tlb(int cpu, vo struct flush_tlb_info *info = data; /* - * Order the 'loaded_mm' and 'is_lazy' against their - * write ordering in switch_mm_irqs_off(). Ensure - * 'is_lazy' is at least as new as 'loaded_mm'. + * switch_mm_irqs_off() should_flush_tlb() + * WRITE_ONCE(is_lazy, false); loaded_mm = READ_ONCE(loaded_mm); + * smp_store_release(loaded_mm, SWITCHING); smp_rmb(); + * mov-cr3 + * smp_store_release(loaded_mm, next) + * if (READ_ONCE(is_lazy)) + * return false; + * + * Where smp_rmb() matches against either smp_store_release() to + * ensure that if we observe loaded_mm to be either SWITCHING or next + * we must also observe is_lazy == false. + * + * If this were not so, it would be possible to falsely return false + * and miss sending an invalidation IPI. */ smp_rmb();