From: David Daney <david.da...@cavium.com> When CONFIG_SMP, we end up calling flush_context() on each CPU (indirectly) from __new_context(). Because of this, doing a broadcast TLB invalidate is overkill, as all CPUs will be doing a local invalidation.
Change the scope of the TLB invalidation operation to be local, resulting in nr_cpus invalidations, rather than nr_cpus^2. On CPUs with a large ASID space this operation is not often done. But, when it is, this reduces the overhead. Benchmarked "time make -j48" kernel build with and without the patch on Cavium ThunderX system, one run to warm up the caches, and then five runs measured: original with-patch 139.299s 139.0766s S.D. 0.321 S.D. 0.159 Probably a little faster, but could be measurement noise. Signed-off-by: David Daney <david.da...@cavium.com> --- arch/arm64/mm/context.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c index 76c1e6c..ab5b8d3 100644 --- a/arch/arm64/mm/context.c +++ b/arch/arm64/mm/context.c @@ -48,7 +48,7 @@ static void flush_context(void) { /* set the reserved TTBR0 before flushing the TLB */ cpu_set_reserved_ttbr0(); - flush_tlb_all(); + flush_tlb_all_local(); if (icache_is_aivivt()) __flush_icache_all(); } -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/