On Wed, 1 Nov 2017 17:35:51 +0530 Anshuman Khandual <khand...@linux.vnet.ibm.com> wrote:
> On 10/31/2017 12:14 PM, Nicholas Piggin wrote: > > Here's a random mix of performance improvements for radix TLB flushing > > code. The main aims are to reduce the amount of translation that gets > > invalidated, and to reduce global flushes where we can do local. > > > > To that end, a parallel kernel compile benchmark using powerpc:tlbie > > tracepoint shows a reduction in tlbie instructions from about 290,000 > > to 80,000, and a reduction in tlbiel instructions from 49,500,000 to > > 15,000,000. Looks great, but unfortunately does not translate to a > > statistically significant performance improvement! The needle on TLB > > misses does not move much, I suspect because a lot of the flushing is > > done a startup and shutdown, and because a significant cost of TLB > > flushing itself is in the barriers. > > Does memory barrier initiate a single global invalidation with tlbie ? > I'm not quite sure what you're asking, and I don't know the details of how the hardware handles it, but from the measurements in patch 1 of the series we can see there is a benefit for both tlbie and tlbiel of batching them up between barriers. Thanks, Nick