Hi Eric, On Fri, Jun 14, 2019 at 04:22:20PM -0700, Eric Dumazet wrote: > Feng Tang reported a performance regression after introduction > of per TCP socket tx/rx caches, for TCP over loopback (netperf) > > There is high chance the regression is caused by a change on > how well the 32 KB per-thread page (current->task_frag) can > be recycled, and lack of pcp caches for order-3 pages.
Exactly! When I checked the regression, I did several experiments, and thought of the simliar idea to add the per-CPU orderX pcp list, the other idea is to add a order3 list in per-cpu softnet_data as local cache. Thanks, Feng > > I could not reproduce the regression myself, cpus all being > spinning on the mm spinlocks for page allocs/freeing, regardless > of enabling or disabling the per tcp socket caches. > > It seems best to disable the feature by default, and let > admins enabling it. > > MM layer either needs to provide scalable order-3 pages > allocations, or could attempt a trylock on zone->lock if > the caller only attempts to get a high-order page and is > able to fallback to order-0 ones in case of pressure. > > Tests run on a 56 cores host (112 hyper threads) > > - 35.49% netperf [kernel.vmlinux] [k] > queued_spin_lock_slowpath > - 35.49% queued_spin_lock_slowpath > - 18.18% get_page_from_freelist > - __alloc_pages_nodemask > - 18.18% alloc_pages_current > skb_page_frag_refill > sk_page_frag_refill > tcp_sendmsg_locked > tcp_sendmsg > inet_sendmsg > sock_sendmsg > __sys_sendto > __x64_sys_sendto > do_syscall_64 > entry_SYSCALL_64_after_hwframe > __libc_send > + 17.31% __free_pages_ok > + 31.43% swapper [kernel.vmlinux] [k] intel_idle > + 9.12% netperf [kernel.vmlinux] [k] > copy_user_enhanced_fast_string > + 6.53% netserver [kernel.vmlinux] [k] > copy_user_enhanced_fast_string > + 0.69% netserver [kernel.vmlinux] [k] > queued_spin_lock_slowpath > + 0.68% netperf [kernel.vmlinux] [k] > skb_release_data > + 0.52% netperf [kernel.vmlinux] [k] > tcp_sendmsg_locked > 0.46% netperf [kernel.vmlinux] [k] > _raw_spin_lock_irqsave > > Fixes: 472c2e07eef0 ("tcp: add one skb cache for tx") > Signed-off-by: Eric Dumazet <eduma...@google.com> > Reported-by: Feng Tang <feng.t...@intel.com> > --- > include/net/sock.h | 4 +++- > net/ipv4/sysctl_net_ipv4.c | 8 ++++++++ > 2 files changed, 11 insertions(+), 1 deletion(-)