From: David Daney <david.da...@cavium.com>

When CONFIG_SMP, we end up calling flush_context() on each CPU
(indirectly) from __new_context().  Because of this, doing a broadcast
TLB invalidate is overkill, as all CPUs will be doing a local
invalidation.

Change the scope of the TLB invalidation operation to be local,
resulting in nr_cpus invalidations, rather than nr_cpus^2.

On CPUs with a large ASID space this operation is not often done.
But, when it is, this reduces the overhead.

Benchmarked "time make -j48" kernel build with and without the patch on
Cavium ThunderX system, one run to warm up the caches, and then five
runs measured:

original      with-patch
139.299s      139.0766s
S.D. 0.321    S.D. 0.159

Probably a little faster, but could be measurement noise.

Signed-off-by: David Daney <david.da...@cavium.com>
---
 arch/arm64/mm/context.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index 76c1e6c..ab5b8d3 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -48,7 +48,7 @@ static void flush_context(void)
 {
        /* set the reserved TTBR0 before flushing the TLB */
        cpu_set_reserved_ttbr0();
-       flush_tlb_all();
+       flush_tlb_all_local();
        if (icache_is_aivivt())
                __flush_icache_all();
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to