[mm, net-next v2] mm: net: memcg accounting for TCP rx zerocopy

2021-03-15 Thread Arjun Roy
From: Arjun Roy TCP zerocopy receive is used by high performance network applications to further scale. For RX zerocopy, the memory containing the network data filled by the network driver is directly mapped into the address space of high performance applications. To keep the TLB cost low, these

[mm, net-next v2] mm: net: memcg accounting for TCP rx zerocopy

2021-03-15 Thread Arjun Roy
From: Arjun Roy TCP zerocopy receive is used by high performance network applications to further scale. For RX zerocopy, the memory containing the network data filled by the network driver is directly mapped into the address space of high performance applications. To keep the TLB cost low, these

[net] tcp: Fix sign comparison bug in getsockopt(TCP_ZEROCOPY_RECEIVE)

2021-02-25 Thread Arjun Roy
From: Arjun Roy getsockopt(TCP_ZEROCOPY_RECEIVE) has a bug where we read a user-provided "len" field of type signed int, and then compare the value to the result of an "offsetofend" operation, which is unsigned. Negative values provided by the user will be promoted to la

[net-next] tcp: Sanitize CMSG flags and reserved args in tcp_zerocopy_receive.

2021-02-11 Thread Arjun Roy
From: Arjun Roy Explicitly define reserved field and require it and any subsequent fields to be zero-valued for now. Additionally, limit the valid CMSG flags that tcp_zerocopy_receive accepts. Fixes: 7eeba1706eba ("tcp: Add receive timestamp support for receive zerocopy.") Signed-off

[net-next v2] tcp: Explicitly mark reserved field in tcp_zerocopy_receive args.

2021-02-06 Thread Arjun Roy
From: Arjun Roy Explicitly define reserved field and require it to be 0-valued. Fixes: 7eeba1706eba ("tcp: Add receive timestamp support for receive zerocopy.") Signed-off-by: Arjun Roy Signed-off-by: Eric Dumazet Signed-off-by: Soheil Hassas Yeganeh Suggested-by: David Ahern Su

[net v2] tcp: Explicitly mark reserved field in tcp_zerocopy_receive args.

2021-02-05 Thread Arjun Roy
From: Arjun Roy Explicitly define reserved field and require it to be 0-valued. Fixes: 7eeba1706eba ("tcp: Add receive timestamp support for receive zerocopy.") Signed-off-by: Arjun Roy Signed-off-by: Eric Dumazet Signed-off-by: Soheil Hassas Yeganeh Suggested-by: David Ahern Su

[net] tcp: Explicitly mark reserved field in tcp_zerocopy_receive args.

2021-02-05 Thread Arjun Roy
From: Arjun Roy Explicitly define reserved field and require it to be 0-valued. Fixes: 7eeba1706eba ("tcp: Add receive timestamp support for receive zerocopy.") Signed-off-by: Arjun Roy Signed-off-by: Eric Dumazet Signed-off-by: Soheil Hassas Yeganeh Suggested-by: David Ahern Su

[net-next v2 1/2] tcp: Remove CMSG magic numbers for tcp_recvmsg().

2021-01-20 Thread Arjun Roy
From: Arjun Roy At present, tcp_recvmsg() uses flags to track if any CMSGs are pending and what those CMSGs are. These flags are currently magic numbers, used only within tcp_recvmsg(). To prepare for receive timestamp support in tcp receive zerocopy, gently refactor these magic numbers into

[net-next v2 2/2] tcp: Add receive timestamp support for receive zerocopy.

2021-01-20 Thread Arjun Roy
From: Arjun Roy tcp_recvmsg() uses the CMSG mechanism to receive control information like packet receive timestamps. This patch adds CMSG fields to struct tcp_zerocopy_receive, and provides receive timestamps if available to the user. Signed-off-by: Arjun Roy --- include/uapi/linux/tcp.h

[net-next v2 0/2] tcp: add CMSG+rx timestamps to rx. zerocopy

2021-01-20 Thread Arjun Roy
From: Arjun Roy Provide CMSG and receive timestamp support to TCP receive zerocopy. Patch 1 refactors CMSG pending state for tcp_recvmsg() to avoid the use of magic numbers; patch 2 implements receive timestamp via CMSG support for receive zerocopy, and uses the constants added in patch 1. v2

[net-next 1/2] tcp: Remove CMSG magic numbers for tcp_recvmsg().

2020-12-11 Thread Arjun Roy
From: Arjun Roy At present, tcp_recvmsg() uses flags to track if any CMSGs are pending and what those CMSGs are. These flags are currently magic numbers, used only within tcp_recvmsg(). To prepare for receive timestamp support in tcp receive zerocopy, gently refactor these magic numbers into

[net-next 0/2] Adds CMSG+rx timestamps to TCP rx. zerocopy

2020-12-11 Thread Arjun Roy
From: Arjun Roy This patch series provides CMSG and receive timestamp support to TCP receive zerocopy. Patch 1 refactors CMSG pending state for tcp_recvmsg() to avoid the use of magic numbers; patch 2 implements receive timestamp via CMSG support for receive zerocopy, and uses the constants

[net-next 2/2] tcp: Add receive timestamp support for receive zerocopy.

2020-12-11 Thread Arjun Roy
From: Arjun Roy tcp_recvmsg() uses the CMSG mechanism to receive control information like packet receive timestamps. This patch adds CMSG fields to struct tcp_zerocopy_receive, and provides receive timestamps if available to the user. Signed-off-by: Arjun Roy Signed-off-by: Eric Dumazet

[net-next] tcp: correctly handle increased zerocopy args struct size

2020-12-10 Thread Arjun Roy
From: Arjun Roy A prior patch increased the size of struct tcp_zerocopy_receive but did not update do_tcp_getsockopt() handling to properly account for this. This patch simply reintroduces content erroneously cut from the referenced prior patch that handles the new struct size. Fixes

[net-next v3 3/8] net-zerocopy: Refactor skb frag fast-forward op.

2020-12-02 Thread Arjun Roy
From: Arjun Roy Refactor skb frag fast-forwarding for tcp receive zerocopy. This is part of a patch set that introduces short-circuited hybrid copies for small receive operations, which results in roughly 33% fewer syscalls for small RPC scenarios. skb_advance_to_frag(), given a skb and an

[net-next v3 4/8] net-zerocopy: Refactor frag-is-remappable test.

2020-12-02 Thread Arjun Roy
From: Arjun Roy Refactor frag-is-remappable test for tcp receive zerocopy. This is part of a patch set that introduces short-circuited hybrid copies for small receive operations, which results in roughly 33% fewer syscalls for small RPC scenarios. Signed-off-by: Arjun Roy Signed-off-by: Eric

[net-next v3 8/8] net-zerocopy: Defer vm zap unless actually needed.

2020-12-02 Thread Arjun Roy
From: Arjun Roy Zapping pages is required only if we are calling vm_insert_page into a region where pages had previously been mapped. Receive zerocopy allows reusing such regions, and hitherto called zap_page_range() before calling vm_insert_page() in that range. zap_page_range() can also be

[net-next v3 2/8] net-tcp: Introduce tcp_recvmsg_locked().

2020-12-02 Thread Arjun Roy
From: Arjun Roy Refactor tcp_recvmsg() by splitting it into locked and unlocked portions. Callers already holding the socket lock and not using ERRQUEUE/cmsg/busy polling can simply call tcp_recvmsg_locked(). This is in preparation for a short-circuit copy performed by TCP receive zerocopy for

[net-next v3 6/8] net-zerocopy: Introduce short-circuit small reads.

2020-12-02 Thread Arjun Roy
From: Arjun Roy Sometimes, we may call tcp receive zerocopy when inq is 0, or inq < PAGE_SIZE, or inq is generally small enough that it is cheaper to copy rather than remap pages. In these cases, we may want to either return early (inq=0) or attempt to use the provided copy buffer to sim

[net-next v3 7/8] net-zerocopy: Set zerocopy hint when data is copied

2020-12-02 Thread Arjun Roy
From: Arjun Roy Set zerocopy hint, event when falling back to copy, so that the pending data can be efficiently received using zerocopy when possible. Signed-off-by: Arjun Roy Signed-off-by: Eric Dumazet Signed-off-by: Soheil Hassas Yeganeh --- net/ipv4/tcp.c | 45

[net-next v3 5/8] net-zerocopy: Fast return if inq < PAGE_SIZE

2020-12-02 Thread Arjun Roy
From: Arjun Roy Sometimes, we may call tcp receive zerocopy when inq is 0, or inq < PAGE_SIZE, in which case we cannot remap pages. In this case, simply return the appropriate hint for regular copying without taking mmap_sem. Signed-off-by: Arjun Roy Signed-off-by: Eric Dumazet Signed-off

[net-next v3 1/8] net-zerocopy: Copy straggler unaligned data for TCP Rx. zerocopy.

2020-12-02 Thread Arjun Roy
From: Arjun Roy When TCP receive zerocopy does not successfully map the entire requested space, it outputs a 'hint' that the caller should recvmsg(). Augment zerocopy to accept a user buffer that it tries to copy this hint into - if it is possible to copy the entire hint, it will d

[net-next v3 0/8] Perf. optimizations for TCP Recv. Zerocopy

2020-12-02 Thread Arjun Roy
From: Arjun Roy This patchset contains several optimizations for TCP Recv. Zerocopy. v3: Fixes 32-bit compilation, stylistic issues and re-adds signoffs. Summarized: 1. It is possible that a read payload is not exactly page aligned - that there may exist "straggler" bytes that we

[net-next v2 6/8] net-zerocopy: Introduce short-circuit small reads.

2020-12-02 Thread Arjun Roy
From: Arjun Roy Sometimes, we may call tcp receive zerocopy when inq is 0, or inq < PAGE_SIZE, or inq is generally small enough that it is cheaper to copy rather than remap pages. In these cases, we may want to either return early (inq=0) or attempt to use the provided copy buffer to sim

[net-next v2 8/8] net-zerocopy: Defer vm zap unless actually needed.

2020-12-02 Thread Arjun Roy
From: Arjun Roy Zapping pages is required only if we are calling vm_insert_page into a region where pages had previously been mapped. Receive zerocopy allows reusing such regions, and hitherto called zap_page_range() before calling vm_insert_page() in that range. zap_page_range() can also be

[net-next v2 7/8] net-zerocopy: Set zerocopy hint when data is copied

2020-12-02 Thread Arjun Roy
From: Arjun Roy Set zerocopy hint, event when falling back to copy, so that the pending data can be efficiently received using zerocopy when possible. --- net/ipv4/tcp.c | 45 + 1 file changed, 45 insertions(+) diff --git a/net/ipv4/tcp.c b/net/ipv4

[net-next v2 5/8] net-zerocopy: Fast return if inq < PAGE_SIZE

2020-12-02 Thread Arjun Roy
From: Arjun Roy Sometimes, we may call tcp receive zerocopy when inq is 0, or inq < PAGE_SIZE, in which case we cannot remap pages. In this case, simply return the appropriate hint for regular copying without taking mmap_sem. --- net/ipv4/tcp.c | 8 1 file changed, 8 inserti

[net-next v2 4/8] net-zerocopy: Refactor frag-is-remappable test.

2020-12-02 Thread Arjun Roy
From: Arjun Roy Refactor frag-is-remappable test for tcp receive zerocopy. This is part of a patch set that introduces short-circuited hybrid copies for small receive operations, which results in roughly 33% fewer syscalls for small RPC scenarios. --- net/ipv4/tcp.c | 34

[net-next v2 2/8] net-tcp: Introduce tcp_recvmsg_locked().

2020-12-02 Thread Arjun Roy
From: Arjun Roy Refactor tcp_recvmsg() by splitting it into locked and unlocked portions. Callers already holding the socket lock and not using ERRQUEUE/cmsg/busy polling can simply call tcp_recvmsg_locked(). This is in preparation for a short-circuit copy performed by TCP receive zerocopy for

[net-next v2 0/8] Perf. optimizations for TCP Recv. Zerocopy

2020-12-02 Thread Arjun Roy
From: Arjun Roy This patchset contains several optimizations for TCP Recv. Zerocopy. Note this is v2 of the patchset, fixing two 32-bit compilation errors and a stylistic error. Summarized: 1. It is possible that a read payload is not exactly page aligned - that there may exist "stra

[net-next v2 1/8] net-zerocopy: Copy straggler unaligned data for TCP Rx. zerocopy.

2020-12-02 Thread Arjun Roy
From: Arjun Roy When TCP receive zerocopy does not successfully map the entire requested space, it outputs a 'hint' that the caller should recvmsg(). Augment zerocopy to accept a user buffer that it tries to copy this hint into - if it is possible to copy the entire hint, it will d

[net-next v2 3/8] net-zerocopy: Refactor skb frag fast-forward op.

2020-12-02 Thread Arjun Roy
From: Arjun Roy Refactor skb frag fast-forwarding for tcp receive zerocopy. This is part of a patch set that introduces short-circuited hybrid copies for small receive operations, which results in roughly 33% fewer syscalls for small RPC scenarios. skb_advance_to_frag(), given a skb and an

[net-next 8/8] tcp: Defer vm zap unless actually needed for recv zerocopy.

2020-11-12 Thread Arjun Roy
From: Arjun Roy Zapping pages is required only if we are calling vm_insert_page into a region where pages had previously been mapped. Receive zerocopy allows reusing such regions, and hitherto called zap_page_range() before calling vm_insert_page() in that range. zap_page_range() can also be

[net-next 0/8] Perf. optimizations for TCP Recv. Zerocopy

2020-11-12 Thread Arjun Roy
From: Arjun Roy This patchset contains several optimizations for TCP Recv. Zerocopy. Summarized: 1. It is possible that a read payload is not exactly page aligned - that there may exist "straggler" bytes that we cannot map into the caller's address space cleanly. For this, we a

[net-next 6/8] tcp: Introduce short-circuit small reads for recv zerocopy.

2020-11-12 Thread Arjun Roy
From: Arjun Roy Sometimes, we may call tcp receive zerocopy when inq is 0, or inq < PAGE_SIZE, or inq is generally small enough that it is cheaper to copy rather than remap pages. In these cases, we may want to either return early (inq=0) or attempt to use the provided copy buffer to sim

[net-next 5/8] tcp: Fast return if inq < PAGE_SIZE for recv zerocopy.

2020-11-12 Thread Arjun Roy
From: Arjun Roy Sometimes, we may call tcp receive zerocopy when inq is 0, or inq < PAGE_SIZE, in which case we cannot remap pages. In this case, simply return the appropriate hint for regular copying without taking mmap_sem. Signed-off-by: Arjun Roy Signed-off-by: Eric Dumazet Signed-off

[net-next 7/8] tcp: Set zerocopy hint when data is copied

2020-11-12 Thread Arjun Roy
From: Arjun Roy Set zerocopy hint, event when falling back to copy, so that the pending data can be efficiently received using zerocopy when possible. Signed-off-by: Arjun Roy Signed-off-by: Eric Dumazet Signed-off-by: Soheil Hassas Yeganeh --- net/ipv4/tcp.c | 45

[net-next 4/8] tcp: Refactor frag-is-remappable test for recv zerocopy.

2020-11-12 Thread Arjun Roy
From: Arjun Roy Refactor frag-is-remappable test for tcp receive zerocopy. This is part of a patch set that introduces short-circuited hybrid copies for small receive operations, which results in roughly 33% fewer syscalls for small RPC scenarios. Signed-off-by: Arjun Roy Signed-off-by: Eric

[net-next 1/8] tcp: Copy straggler unaligned data for TCP Rx. zerocopy.

2020-11-12 Thread Arjun Roy
From: Arjun Roy When TCP receive zerocopy does not successfully map the entire requested space, it outputs a 'hint' that the caller should recvmsg(). Augment zerocopy to accept a user buffer that it tries to copy this hint into - if it is possible to copy the entire hint, it will d

[net-next 3/8] tcp: Refactor skb frag fast-forward op for recv zerocopy.

2020-11-12 Thread Arjun Roy
From: Arjun Roy Refactor skb frag fast-forwarding for tcp receive zerocopy. This is part of a patch set that introduces short-circuited hybrid copies for small receive operations, which results in roughly 33% fewer syscalls for small RPC scenarios. skb_advance_to_frag(), given a skb and an

[net-next 2/8] tcp: Introduce tcp_recvmsg_locked().

2020-11-12 Thread Arjun Roy
From: Arjun Roy Refactor tcp_recvmsg() by splitting it into locked and unlocked portions. Callers already holding the socket lock and not using ERRQUEUE/cmsg/busy polling can simply call tcp_recvmsg_locked(). This is in preparation for a short-circuit copy performed by TCP receive zerocopy for

[net v2] tcp: Prevent low rmem stalls with SO_RCVLOWAT.

2020-10-23 Thread Arjun Roy
From: Arjun Roy With SO_RCVLOWAT, under memory pressure, it is possible to enter a state where: 1. We have not received enough bytes to satisfy SO_RCVLOWAT. 2. We have not entered buffer pressure (see tcp_rmem_pressure()). 3. But, we do not have enough buffer space to accept more packets. In

[net] tcp: Prevent low rmem stalls with SO_RCVLOWAT.

2020-10-23 Thread Arjun Roy
From: Arjun Roy With SO_RCVLOWAT, under memory pressure, it is possible to enter a state where: 1. We have not received enough bytes to satisfy SO_RCVLOWAT. 2. We have not entered buffer pressure (see tcp_rmem_pressure()). 3. But, we do not have enough buffer space to accept more packets. In