>From linux-3.7, (commit 5640f7685831 "net: use a per task frag allocator") TCP sendmsg() has preferred using order-3 allocations.
While it gives good results for most cases, we had reports that heavy uses of TCP over loopback were hitting a spinlock contention in page allocations/freeing. This commits adds a sysctl so that admins can opt-in for order-0 allocations. Hopefully mm layer might optimize order-3 allocations in the future since it could give us a nice boost (see 8 lines of following benchmark) The following benchmark shows a win when more than 8 TCP_STREAM threads are running (56 x86 cores server in my tests) for thr in {1..30} do sysctl -wq net.core.high_order_alloc_disable=0 T0=`./super_netperf $thr -H 127.0.0.1 -l 15` sysctl -wq net.core.high_order_alloc_disable=1 T1=`./super_netperf $thr -H 127.0.0.1 -l 15` echo $thr:$T0:$T1 done 1: 49979: 37267 2: 98745: 76286 3: 141088: 110051 4: 177414: 144772 5: 197587: 173563 6: 215377: 208448 7: 241061: 234087 8: 267155: 263373 9: 295069: 297402 10: 312393: 335213 11: 340462: 368778 12: 371366: 403954 13: 412344: 443713 14: 426617: 473580 15: 474418: 507861 16: 503261: 538539 17: 522331: 563096 18: 532409: 567084 19: 550824: 605240 20: 525493: 641988 21: 564574: 665843 22: 567349: 690868 23: 583846: 710917 24: 588715: 736306 25: 603212: 763494 26: 604083: 792654 27: 602241: 796450 28: 604291: 797993 29: 611610: 833249 30: 577356: 841062 Signed-off-by: Eric Dumazet <eduma...@google.com> --- include/net/sock.h | 2 ++ net/core/sock.c | 4 +++- net/core/sysctl_net_core.c | 7 +++++++ 3 files changed, 12 insertions(+), 1 deletion(-) diff --git a/include/net/sock.h b/include/net/sock.h index 7d7f4ce63bb2aae7c87a9445d11339b6e6b19724..6cbc16136357d158cf1e84b98ecb7e06898269a6 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -2534,6 +2534,8 @@ extern int sysctl_optmem_max; extern __u32 sysctl_wmem_default; extern __u32 sysctl_rmem_default; +DECLARE_STATIC_KEY_FALSE(net_high_order_alloc_disable_key); + static inline int sk_get_wmem0(const struct sock *sk, const struct proto *proto) { /* Does this proto have per netns sysctl_wmem ? */ diff --git a/net/core/sock.c b/net/core/sock.c index 2b3701958486219a26385c8fca1498c4e294dc1d..7f49392579a5826b0a8446bb88428396422cc0b6 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2320,6 +2320,7 @@ static void sk_leave_memory_pressure(struct sock *sk) /* On 32bit arches, an skb frag is limited to 2^15 */ #define SKB_FRAG_PAGE_ORDER get_order(32768) +DEFINE_STATIC_KEY_FALSE(net_high_order_alloc_disable_key); /** * skb_page_frag_refill - check that a page_frag contains enough room @@ -2344,7 +2345,8 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp) } pfrag->offset = 0; - if (SKB_FRAG_PAGE_ORDER) { + if (SKB_FRAG_PAGE_ORDER && + !static_branch_unlikely(&net_high_order_alloc_disable_key)) { /* Avoid direct reclaim but allow kswapd to wake */ pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) | __GFP_COMP | __GFP_NOWARN | diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c index 1a2685694abd537d7ae304754b84b237928fd298..f9204719aeeeb4700582d03fda244f2f8961f8a7 100644 --- a/net/core/sysctl_net_core.c +++ b/net/core/sysctl_net_core.c @@ -562,6 +562,13 @@ static struct ctl_table net_core_table[] = { .extra1 = &zero, .extra2 = &two, }, + { + .procname = "high_order_alloc_disable", + .data = &net_high_order_alloc_disable_key.key, + .maxlen = sizeof(net_high_order_alloc_disable_key), + .mode = 0644, + .proc_handler = proc_do_static_key, + }, { } }; -- 2.22.0.410.gd8fdbe21b5-goog