On 10/31/2017 12:14 PM, Nicholas Piggin wrote: > Here's a random mix of performance improvements for radix TLB flushing > code. The main aims are to reduce the amount of translation that gets > invalidated, and to reduce global flushes where we can do local. > > To that end, a parallel kernel compile benchmark using powerpc:tlbie > tracepoint shows a reduction in tlbie instructions from about 290,000 > to 80,000, and a reduction in tlbiel instructions from 49,500,000 to > 15,000,000. Looks great, but unfortunately does not translate to a > statistically significant performance improvement! The needle on TLB > misses does not move much, I suspect because a lot of the flushing is > done a startup and shutdown, and because a significant cost of TLB > flushing itself is in the barriers.
Does memory barrier initiate a single global invalidation with tlbie ?