> With shared mapping, even though we are unmapping a large range, the kernel > will force a TLB flush with ptl lock held to avoid the race mentioned in > commit 1cf35d47712d ("mm: split 'tlb_flush_mmu()' into tlb flushing and memory freeing parts") > This results in the kernel issuing a high number of TLB flushes even for a large > range. This can be improved by making sure the kernel switch to pid based flush if the > kernel is unmapping a 2M range. > > Signed-off-by: Aneesh Kumar K.V <aneesh.ku...@linux.ibm.com> > --- > arch/powerpc/mm/book3s64/radix_tlb.c | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c > index aefc100d79a7..21d0f098e43b 100644 > --- a/arch/powerpc/mm/book3s64/radix_tlb.c > +++ b/arch/powerpc/mm/book3s64/radix_tlb.c > @@ -1106,7 +1106,7 @@ EXPORT_SYMBOL(radix__flush_tlb_kernel_range); > * invalidating a full PID, so it has a far lower threshold to change from > * individual page flushes to full-pid flushes. > */ > -static unsigned long tlb_single_page_flush_ceiling __read_mostly = 33; > +static unsigned long tlb_single_page_flush_ceiling __read_mostly = 32; > static unsigned long tlb_local_single_page_flush_ceiling __read_mostly = POWER9_TLB_SETS_RADIX * 2; > > static inline void __radix__flush_tlb_range(struct mm_struct *mm, > @@ -1133,7 +1133,7 @@ static inline void __radix__flush_tlb_range(struct mm_struct *mm, > if (fullmm) > flush_pid = true; > else if (type == FLUSH_TYPE_GLOBAL) > - flush_pid = nr_pages > tlb_single_page_flush_ceiling; > + flush_pid = nr_pages >= tlb_single_page_flush_ceiling; > else > flush_pid = nr_pages > tlb_local_single_page_flush_ceiling;
I evaluated the patches from Aneesh with a micro benchmark which does shmat, shmdt of 256 MB segment. Higher the rate of work, better the performance. With a value of 32, we match the performance of GTSE=off. This was evaluated on SLES15 SP3 kernel. # cat /sys/kernel/debug/powerpc/tlb_single_page_flush_ceiling 32 # perf stat -I 1000 -a -e powerpc:tlbie,r30058 ./tlbie -i 5 -c 1 t 1 Rate of work: = 311 # time counts unit events 1.013131404 50939 powerpc:tlbie 1.013131404 50658 r30058 Rate of work: = 318 2.026957019 51520 powerpc:tlbie 2.026957019 51481 r30058 Rate of work: = 318 3.038884431 51485 powerpc:tlbie 3.038884431 51461 r30058 Rate of work: = 318 4.051483926 51485 powerpc:tlbie 4.051483926 51520 r30058 Rate of work: = 318 5.063635713 48577 powerpc:tlbie 5.063635713 48347 r30058 # echo 34 > /sys/kernel/debug/powerpc/tlb_single_page_flush_ceiling # perf stat -I 1000 -a -e powerpc:tlbie,r30058 ./tlbie -i 5 -c 1 t 1 Rate of work: = 174 # time counts unit events 1.012672696 721471 powerpc:tlbie 1.012672696 726491 r30058 Rate of work: = 177 2.026348661 737460 powerpc:tlbie 2.026348661 736138 r30058 Rate of work: = 178 3.037932122 737460 powerpc:tlbie 3.037932122 737460 r30058 Rate of work: = 178 4.050198819 737044 powerpc:tlbie 4.050198819 737460 r30058 Rate of work: = 177 5.062400776 692832 powerpc:tlbie 5.062400776 688319 r30058 Regards, Puvichakravarthy Ramachandran