radix TLB flush performance improvements

Nicholas Piggin Wed, 01 Nov 2017 06:40:59 -0700

On Wed, 1 Nov 2017 17:35:51 +0530
Anshuman Khandual <[email protected]> wrote:


> On 10/31/2017 12:14 PM, Nicholas Piggin wrote:
> > Here's a random mix of performance improvements for radix TLB flushing
> > code. The main aims are to reduce the amount of translation that gets
> > invalidated, and to reduce global flushes where we can do local.
> > 
> > To that end, a parallel kernel compile benchmark using powerpc:tlbie
> > tracepoint shows a reduction in tlbie instructions from about 290,000
> > to 80,000, and a reduction in tlbiel instructions from 49,500,000 to
> > 15,000,000. Looks great, but unfortunately does not translate to a
> > statistically significant performance improvement! The needle on TLB
> > misses does not move much, I suspect because a lot of the flushing is
> > done a startup and shutdown, and because a significant cost of TLB
> > flushing itself is in the barriers.  
> 
> Does memory barrier initiate a single global invalidation with tlbie ?
> 

I'm not quite sure what you're asking, and I don't know the details
of how the hardware handles it, but from the measurements in patch
1 of the series we can see there is a benefit for both tlbie and
tlbiel of batching them up between barriers.

Thanks,
Nick

Re: [RFC PATCH 0/7] powerpc/64s/radix TLB flush performance improvements

Reply via email to