Re: [PATCH net 4/4] virtio/vsock: Put vsock_connected_sockets_vsk() to use
On 11/7/24 11:22, Stefano Garzarella wrote: > On Wed, Nov 06, 2024 at 06:51:21PM +0100, Michal Luczaj wrote: >> Macro vsock_connected_sockets_vsk() has been unused since its introduction. >> Instead of removing it, utilise it in vsock_insert_connected() where it's >> been open-coded. >> >> No functional change intended. >> >> Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") > > This is not a fix, so please remove the Fixes tag, we don't need to > backport this patch in stable branches. > > Also in this case this is not related at all with virtio transport, so > please remove `virtio` from the commit title. > > In addition maybe you can remove this patch from this series, and send > it to net-next. > ... Right, I get it. Just to be clear: are such small (and non-functional) cleanups welcomed coming by themselves? Thanks, Michal
[PATCH net v2 3/3] virtio/vsock: Improve MSG_ZEROCOPY error handling
Add a missing kfree_skb() to prevent memory leaks. Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support") Reviewed-by: Stefano Garzarella Signed-off-by: Michal Luczaj --- net/vmw_vsock/virtio_transport_common.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index cd075f608d4f6f48f894543e5e9c966d3e5f22df..e2e6a30b759bdc6371bb0d63ee2e77c0ba148fd2 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -400,6 +400,7 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk, if (virtio_transport_init_zcopy_skb(vsk, skb, info->msg, can_zcopy)) { + kfree_skb(skb); ret = -ENOMEM; break; } -- 2.46.2
[PATCH net v2 2/3] vsock: Fix sk_error_queue memory leak
Kernel queues MSG_ZEROCOPY completion notifications on the error queue. Where they remain, until explicitly recv()ed. To prevent memory leaks, clean up the queue when the socket is destroyed. unreferenced object 0x8881028beb00 (size 224): comm "vsock_test", pid 1218, jiffies 4294694897 hex dump (first 32 bytes): 90 b0 21 17 81 88 ff ff 90 b0 21 17 81 88 ff ff ..!...!. 00 00 00 00 00 00 00 00 00 b0 21 17 81 88 ff ff ..!. backtrace (crc 6c7031ca): [] kmem_cache_alloc_node_noprof+0x2f7/0x370 [] __alloc_skb+0x132/0x180 [] sock_omalloc+0x4b/0x80 [] msg_zerocopy_realloc+0x9e/0x240 [] virtio_transport_send_pkt_info+0x412/0x4c0 [] virtio_transport_stream_enqueue+0x43/0x50 [] vsock_connectible_sendmsg+0x373/0x450 [] sys_sendmsg+0x365/0x3a0 [] ___sys_sendmsg+0x84/0xd0 [] __sys_sendmsg+0x47/0x80 [] do_syscall_64+0x93/0x180 [] entry_SYSCALL_64_after_hwframe+0x76/0x7e Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support") Signed-off-by: Michal Luczaj --- net/vmw_vsock/af_vsock.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index 35681adedd9aaec3565495158f5342b8aa76c9bc..dfd29160fe11c4675f872c1ee123d65b2da0dae6 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -836,6 +836,9 @@ static void vsock_sk_destruct(struct sock *sk) { struct vsock_sock *vsk = vsock_sk(sk); + /* Flush MSG_ZEROCOPY leftovers. */ + __skb_queue_purge(&sk->sk_error_queue); + vsock_deassign_transport(vsk); /* When clearing these addresses, there's no need to set the family and -- 2.46.2
[PATCH net v2 1/3] virtio/vsock: Fix accept_queue memory leak
As the final stages of socket destruction may be delayed, it is possible that virtio_transport_recv_listen() will be called after the accept_queue has been flushed, but before the SOCK_DONE flag has been set. As a result, sockets enqueued after the flush would remain unremoved, leading to a memory leak. vsock_release __vsock_release lock virtio_transport_release virtio_transport_close schedule_delayed_work(close_work) sk_shutdown = SHUTDOWN_MASK (!) flush accept_queue release virtio_transport_recv_pkt vsock_find_bound_socket lock if flag(SOCK_DONE) return virtio_transport_recv_listen child = vsock_create_connected (!) vsock_enqueue_accept(child) release close_work lock virtio_transport_do_close set_flag(SOCK_DONE) virtio_transport_remove_sock vsock_remove_sock vsock_remove_bound release Introduce a sk_shutdown check to disallow vsock_enqueue_accept() during socket destruction. unreferenced object 0x888109e3f800 (size 2040): comm "kworker/5:2", pid 371, jiffies 4294940105 hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 28 00 0b 40 00 00 00 00 00 00 00 00 00 00 00 00 (..@ backtrace (crc 9e5f4e84): [] kmem_cache_alloc_noprof+0x2c1/0x360 [] sk_prot_alloc+0x30/0x120 [] sk_alloc+0x2c/0x4b0 [] __vsock_create.constprop.0+0x2a/0x310 [] virtio_transport_recv_pkt+0x4dc/0x9a0 [] vsock_loopback_work+0xfd/0x140 [] process_one_work+0x20c/0x570 [] worker_thread+0x1bf/0x3a0 [] kthread+0xdd/0x110 [] ret_from_fork+0x2d/0x50 [] ret_from_fork_asm+0x1a/0x30 Fixes: 3fe356d58efa ("vsock/virtio: discard packets only when socket is really closed") Reviewed-by: Stefano Garzarella Signed-off-by: Michal Luczaj --- net/vmw_vsock/virtio_transport_common.c | 8 1 file changed, 8 insertions(+) diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index ccbd2bc0d2109aea4f19e79a0438f85893e1d89c..cd075f608d4f6f48f894543e5e9c966d3e5f22df 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -1512,6 +1512,14 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb, return -ENOMEM; } + /* __vsock_release() might have already flushed accept_queue. +* Subsequent enqueues would lead to a memory leak. +*/ + if (sk->sk_shutdown == SHUTDOWN_MASK) { + virtio_transport_reset_no_sock(t, skb); + return -ESHUTDOWN; + } + child = vsock_create_connected(sk); if (!child) { virtio_transport_reset_no_sock(t, skb); -- 2.46.2
Re: [PATCH net 2/4] virtio/vsock: Fix sk_error_queue memory leak
On 11/7/24 11:17, Stefano Garzarella wrote: > On Wed, Nov 06, 2024 at 06:51:19PM +0100, Michal Luczaj wrote: >> Kernel queues MSG_ZEROCOPY completion notifications on the error queue. >> Where they remain, until explicitly recv()ed. To prevent memory leaks, >> clean up the queue when the socket is destroyed. >> >> unreferenced object 0x8881028beb00 (size 224): >> comm "vsock_test", pid 1218, jiffies 4294694897 >> hex dump (first 32 bytes): >>90 b0 21 17 81 88 ff ff 90 b0 21 17 81 88 ff ff ..!...!. >>00 00 00 00 00 00 00 00 00 b0 21 17 81 88 ff ff ..!. >> backtrace (crc 6c7031ca): >>[] kmem_cache_alloc_node_noprof+0x2f7/0x370 >>[] __alloc_skb+0x132/0x180 >>[] sock_omalloc+0x4b/0x80 >>[] msg_zerocopy_realloc+0x9e/0x240 >>[] virtio_transport_send_pkt_info+0x412/0x4c0 >>[] virtio_transport_stream_enqueue+0x43/0x50 >>[] vsock_connectible_sendmsg+0x373/0x450 >>[] sys_sendmsg+0x365/0x3a0 >>[] ___sys_sendmsg+0x84/0xd0 >>[] __sys_sendmsg+0x47/0x80 >>[] do_syscall_64+0x93/0x180 >>[] entry_SYSCALL_64_after_hwframe+0x76/0x7e >> >> Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support") >> Signed-off-by: Michal Luczaj >> --- >> net/vmw_vsock/af_vsock.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c >> index >> 35681adedd9aaec3565495158f5342b8aa76c9bc..dfd29160fe11c4675f872c1ee123d65b2da0dae6 >> 100644 >> --- a/net/vmw_vsock/af_vsock.c >> +++ b/net/vmw_vsock/af_vsock.c >> @@ -836,6 +836,9 @@ static void vsock_sk_destruct(struct sock *sk) >> { >> struct vsock_sock *vsk = vsock_sk(sk); >> >> +/* Flush MSG_ZEROCOPY leftovers. */ >> +__skb_queue_purge(&sk->sk_error_queue); >> + > > It is true that for now this is supported only in the virtio transport, > but it's more related to the core, so please remove `virtio` from the > commit title. > > The rest LGTM. > ... OK, done. Here's v2 of the series: https://lore.kernel.org/netdev/20241107-vsock-mem-leaks-v2-0-4e21bfcfc...@rbox.co/ Thanks for the reviews, Michal
[PATCH net v2 0/3] virtio/vsock: Fix memory leaks
Short series fixing some memory leaks that I've stumbled upon while toying with the selftests. Signed-off-by: Michal Luczaj --- Changes in v2: - Remove the refactoring patch from the series [Stefano] - PATCH 2: Drop "virtio" from the commit title [Stefano] - Collect Reviewed-by [Stefano] - Link to v1: https://lore.kernel.org/r/20241106-vsock-mem-leaks-v1-0-8f4ffc309...@rbox.co --- Michal Luczaj (3): virtio/vsock: Fix accept_queue memory leak vsock: Fix sk_error_queue memory leak virtio/vsock: Improve MSG_ZEROCOPY error handling net/vmw_vsock/af_vsock.c| 3 +++ net/vmw_vsock/virtio_transport_common.c | 9 + 2 files changed, 12 insertions(+) --- base-commit: 71712cf519faeed529549a79559c06c7fc250a15 change-id: 20241106-vsock-mem-leaks-9b63e912560a Best regards, -- Michal Luczaj
Re: [PATCH net 4/4] virtio/vsock: Put vsock_connected_sockets_vsk() to use
On Wed, Nov 06, 2024 at 06:51:21PM +0100, Michal Luczaj wrote: Macro vsock_connected_sockets_vsk() has been unused since its introduction. Instead of removing it, utilise it in vsock_insert_connected() where it's been open-coded. No functional change intended. Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") This is not a fix, so please remove the Fixes tag, we don't need to backport this patch in stable branches. Also in this case this is not related at all with virtio transport, so please remove `virtio` from the commit title. In addition maybe you can remove this patch from this series, and send it to net-next. Thanks, Stefano Signed-off-by: Michal Luczaj --- net/vmw_vsock/af_vsock.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index dfd29160fe11c4675f872c1ee123d65b2da0dae6..c3a37c3d4bf3c8117fbc8bd020da8dc1c9212732 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -275,8 +275,7 @@ static void vsock_insert_unbound(struct vsock_sock *vsk) void vsock_insert_connected(struct vsock_sock *vsk) { - struct list_head *list = vsock_connected_sockets( - &vsk->remote_addr, &vsk->local_addr); + struct list_head *list = vsock_connected_sockets_vsk(vsk); spin_lock_bh(&vsock_table_lock); __vsock_insert_connected(list, vsk); -- 2.46.2
[PATCH net-next v3 04/13] virtio_ring: perform premapped operations based on per-buffer
The current configuration sets the virtqueue (vq) to premapped mode, implying that all buffers submitted to this queue must be mapped ahead of time. This presents a challenge for the virtnet send queue (sq): the virtnet driver would be required to keep track of dma information for vq size * 17, which can be substantial. However, if the premapped mode were applied on a per-buffer basis, the complexity would be greatly reduced. With AF_XDP enabled, AF_XDP buffers would become premapped, while kernel skb buffers could remain unmapped. And consider that some sgs are not generated by the virtio driver, that may be passed from the block stack. So we can not change the sgs, new APIs are the better way. So we pass the new argument 'premapped' to indicate the buffers submitted to virtio are premapped in advance. Additionally, DMA unmap operations for these buffers will be bypassed. Suggested-by: Jason Wang Signed-off-by: Xuan Zhuo Acked-by: Jason Wang --- drivers/virtio/virtio_ring.c | 101 ++- 1 file changed, 53 insertions(+), 48 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index cfe70c40f630..fefa85a5e6b6 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -300,9 +300,10 @@ static bool vring_use_dma_api(const struct virtio_device *vdev) return false; } -static bool vring_need_unmap_buffer(const struct vring_virtqueue *vring) +static bool vring_need_unmap_buffer(const struct vring_virtqueue *vring, + const struct vring_desc_extra *extra) { - return vring->use_dma_api && !vring->premapped; + return vring->use_dma_api && (extra->addr != DMA_MAPPING_ERROR); } size_t virtio_max_dma_size(const struct virtio_device *vdev) @@ -372,13 +373,17 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq) /* Map one sg entry. */ static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg, - enum dma_data_direction direction, dma_addr_t *addr) + enum dma_data_direction direction, dma_addr_t *addr, + u32 *len, bool premapped) { - if (vq->premapped) { + if (premapped) { *addr = sg_dma_address(sg); + *len = sg_dma_len(sg); return 0; } + *len = sg->length; + if (!vq->use_dma_api) { /* * If DMA is not used, KMSAN doesn't know that the scatterlist @@ -465,7 +470,7 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq, (flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : DMA_TO_DEVICE); } else { - if (!vring_need_unmap_buffer(vq)) + if (!vring_need_unmap_buffer(vq, extra)) goto out; dma_unmap_page(vring_dma_dev(vq), @@ -514,7 +519,7 @@ static inline unsigned int virtqueue_add_desc_split(struct virtqueue *vq, unsigned int i, dma_addr_t addr, unsigned int len, - u16 flags) + u16 flags, bool premapped) { u16 next; @@ -522,7 +527,7 @@ static inline unsigned int virtqueue_add_desc_split(struct virtqueue *vq, desc[i].addr = cpu_to_virtio64(vq->vdev, addr); desc[i].len = cpu_to_virtio32(vq->vdev, len); - extra[i].addr = addr; + extra[i].addr = premapped ? DMA_MAPPING_ERROR : addr; extra[i].len = len; extra[i].flags = flags; @@ -540,6 +545,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq, unsigned int in_sgs, void *data, void *ctx, + bool premapped, gfp_t gfp) { struct vring_virtqueue *vq = to_vvq(_vq); @@ -605,38 +611,41 @@ static inline int virtqueue_add_split(struct virtqueue *_vq, for (n = 0; n < out_sgs; n++) { for (sg = sgs[n]; sg; sg = sg_next(sg)) { dma_addr_t addr; + u32 len; - if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr)) + if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr, &len, premapped)) goto unmap_release; prev = i; /* Note that we trust indirect descriptor * table since it use stream DMA mapping. */ - i = virtqueue_add_desc_split(_vq, desc, extra, i, addr, sg->length,
Re: [PATCH net 1/4] virtio/vsock: Fix accept_queue memory leak
On Wed, Nov 06, 2024 at 06:51:18PM +0100, Michal Luczaj wrote: As the final stages of socket destruction may be delayed, it is possible that virtio_transport_recv_listen() will be called after the accept_queue has been flushed, but before the SOCK_DONE flag has been set. As a result, sockets enqueued after the flush would remain unremoved, leading to a memory leak. vsock_release __vsock_release lock virtio_transport_release virtio_transport_close schedule_delayed_work(close_work) sk_shutdown = SHUTDOWN_MASK (!) flush accept_queue release virtio_transport_recv_pkt vsock_find_bound_socket lock if flag(SOCK_DONE) return virtio_transport_recv_listen child = vsock_create_connected (!) vsock_enqueue_accept(child) release close_work lock virtio_transport_do_close set_flag(SOCK_DONE) virtio_transport_remove_sock vsock_remove_sock vsock_remove_bound release Introduce a sk_shutdown check to disallow vsock_enqueue_accept() during socket destruction. unreferenced object 0x888109e3f800 (size 2040): comm "kworker/5:2", pid 371, jiffies 4294940105 hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 28 00 0b 40 00 00 00 00 00 00 00 00 00 00 00 00 (..@ backtrace (crc 9e5f4e84): [] kmem_cache_alloc_noprof+0x2c1/0x360 [] sk_prot_alloc+0x30/0x120 [] sk_alloc+0x2c/0x4b0 [] __vsock_create.constprop.0+0x2a/0x310 [] virtio_transport_recv_pkt+0x4dc/0x9a0 [] vsock_loopback_work+0xfd/0x140 [] process_one_work+0x20c/0x570 [] worker_thread+0x1bf/0x3a0 [] kthread+0xdd/0x110 [] ret_from_fork+0x2d/0x50 [] ret_from_fork_asm+0x1a/0x30 Fixes: 3fe356d58efa ("vsock/virtio: discard packets only when socket is really closed") Signed-off-by: Michal Luczaj --- net/vmw_vsock/virtio_transport_common.c | 8 1 file changed, 8 insertions(+) Reviewed-by: Stefano Garzarella diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index ccbd2bc0d2109aea4f19e79a0438f85893e1d89c..cd075f608d4f6f48f894543e5e9c966d3e5f22df 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -1512,6 +1512,14 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb, return -ENOMEM; } + /* __vsock_release() might have already flushed accept_queue. +* Subsequent enqueues would lead to a memory leak. +*/ + if (sk->sk_shutdown == SHUTDOWN_MASK) { + virtio_transport_reset_no_sock(t, skb); + return -ESHUTDOWN; + } + child = vsock_create_connected(sk); if (!child) { virtio_transport_reset_no_sock(t, skb); -- 2.46.2
Re: [PATCH net 2/4] virtio/vsock: Fix sk_error_queue memory leak
On Wed, Nov 06, 2024 at 06:51:19PM +0100, Michal Luczaj wrote: Kernel queues MSG_ZEROCOPY completion notifications on the error queue. Where they remain, until explicitly recv()ed. To prevent memory leaks, clean up the queue when the socket is destroyed. unreferenced object 0x8881028beb00 (size 224): comm "vsock_test", pid 1218, jiffies 4294694897 hex dump (first 32 bytes): 90 b0 21 17 81 88 ff ff 90 b0 21 17 81 88 ff ff ..!...!. 00 00 00 00 00 00 00 00 00 b0 21 17 81 88 ff ff ..!. backtrace (crc 6c7031ca): [] kmem_cache_alloc_node_noprof+0x2f7/0x370 [] __alloc_skb+0x132/0x180 [] sock_omalloc+0x4b/0x80 [] msg_zerocopy_realloc+0x9e/0x240 [] virtio_transport_send_pkt_info+0x412/0x4c0 [] virtio_transport_stream_enqueue+0x43/0x50 [] vsock_connectible_sendmsg+0x373/0x450 [] sys_sendmsg+0x365/0x3a0 [] ___sys_sendmsg+0x84/0xd0 [] __sys_sendmsg+0x47/0x80 [] do_syscall_64+0x93/0x180 [] entry_SYSCALL_64_after_hwframe+0x76/0x7e Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support") Signed-off-by: Michal Luczaj --- net/vmw_vsock/af_vsock.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index 35681adedd9aaec3565495158f5342b8aa76c9bc..dfd29160fe11c4675f872c1ee123d65b2da0dae6 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -836,6 +836,9 @@ static void vsock_sk_destruct(struct sock *sk) { struct vsock_sock *vsk = vsock_sk(sk); + /* Flush MSG_ZEROCOPY leftovers. */ + __skb_queue_purge(&sk->sk_error_queue); + It is true that for now this is supported only in the virtio transport, but it's more related to the core, so please remove `virtio` from the commit title. The rest LGTM. Thanks, Stefano vsock_deassign_transport(vsk); /* When clearing these addresses, there's no need to set the family and -- 2.46.2
Re: [PATCH net 3/4] virtio/vsock: Improve MSG_ZEROCOPY error handling
On Wed, Nov 06, 2024 at 06:51:20PM +0100, Michal Luczaj wrote: Add a missing kfree_skb() to prevent memory leaks. Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support") Signed-off-by: Michal Luczaj --- net/vmw_vsock/virtio_transport_common.c | 1 + 1 file changed, 1 insertion(+) Reviewed-by: Stefano Garzarella diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index cd075f608d4f6f48f894543e5e9c966d3e5f22df..e2e6a30b759bdc6371bb0d63ee2e77c0ba148fd2 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -400,6 +400,7 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk, if (virtio_transport_init_zcopy_skb(vsk, skb, info->msg, can_zcopy)) { + kfree_skb(skb); ret = -ENOMEM; break; } -- 2.46.2
[PATCH net-next v3 02/13] virtio_ring: split: record extras for indirect buffers
The subsequent commit needs to know whether every indirect buffer is premapped or not. So we need to introduce an extra struct for every indirect buffer to record this info. Signed-off-by: Xuan Zhuo --- drivers/virtio/virtio_ring.c | 112 --- 1 file changed, 52 insertions(+), 60 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 97590c201aa2..405d5a348795 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -69,7 +69,11 @@ struct vring_desc_state_split { void *data; /* Data for callback. */ - struct vring_desc *indir_desc; /* Indirect descriptor, if any. */ + + /* Indirect desc table and extra table, if any. These two will be +* allocated together. So we won't stress more to the memory allocator. +*/ + struct vring_desc *indir_desc; }; struct vring_desc_state_packed { @@ -440,38 +444,20 @@ static void virtqueue_init(struct vring_virtqueue *vq, u32 num) * Split ring specific functions - *_split(). */ -static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq, - const struct vring_desc *desc) -{ - u16 flags; - - if (!vring_need_unmap_buffer(vq)) - return; - - flags = virtio16_to_cpu(vq->vq.vdev, desc->flags); - - dma_unmap_page(vring_dma_dev(vq), - virtio64_to_cpu(vq->vq.vdev, desc->addr), - virtio32_to_cpu(vq->vq.vdev, desc->len), - (flags & VRING_DESC_F_WRITE) ? - DMA_FROM_DEVICE : DMA_TO_DEVICE); -} - static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq, - unsigned int i) + struct vring_desc_extra *extra) { - struct vring_desc_extra *extra = vq->split.desc_extra; u16 flags; - flags = extra[i].flags; + flags = extra->flags; if (flags & VRING_DESC_F_INDIRECT) { if (!vq->use_dma_api) goto out; dma_unmap_single(vring_dma_dev(vq), -extra[i].addr, -extra[i].len, +extra->addr, +extra->len, (flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : DMA_TO_DEVICE); } else { @@ -479,22 +465,23 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq, goto out; dma_unmap_page(vring_dma_dev(vq), - extra[i].addr, - extra[i].len, + extra->addr, + extra->len, (flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : DMA_TO_DEVICE); } out: - return extra[i].next; + return extra->next; } static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq, unsigned int total_sg, gfp_t gfp) { + struct vring_desc_extra *extra; struct vring_desc *desc; - unsigned int i; + unsigned int i, size; /* * We require lowmem mappings for the descriptors because @@ -503,40 +490,41 @@ static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq, */ gfp &= ~__GFP_HIGHMEM; - desc = kmalloc_array(total_sg, sizeof(struct vring_desc), gfp); + size = sizeof(*desc) * total_sg + sizeof(*extra) * total_sg; + + desc = kmalloc(size, gfp); if (!desc) return NULL; + extra = (struct vring_desc_extra *)&desc[total_sg]; + for (i = 0; i < total_sg; i++) - desc[i].next = cpu_to_virtio16(_vq->vdev, i + 1); + extra[i].next = i + 1; + return desc; } static inline unsigned int virtqueue_add_desc_split(struct virtqueue *vq, struct vring_desc *desc, + struct vring_desc_extra *extra, unsigned int i, dma_addr_t addr, unsigned int len, - u16 flags, - bool indirect) + u16 flags) { - struct vring_virtqueue *vring = to_vvq(vq); - struct vring_desc_extra *extra = vring->split.desc_extra; u16 next; desc[i].flags = cpu_to_virtio16(vq->vdev, flags); desc[i].addr = cpu_to_vir
[PATCH net-next v3 08/13] virtio_net: refactor the xmit type
Because the af-xdp will introduce a new xmit type, so I refactor the xmit type mechanism first. We know both xdp_frame and sk_buff are at least 4 bytes aligned. For the xdp tx, we do not pass any pointer to virtio core as data, we just need to pass the len of the packet. So we will push len to the void pointer. We can make sure the pointer is 4 bytes aligned. And the data structure of AF_XDP also is at least 4 bytes aligned. So the last two bits of the pointers are free, we can't use these to distinguish them. 00 for skb 01 for SKB_ORPHAN 10 for XDP 11 for AF-XDP tx Signed-off-by: Xuan Zhuo Acked-by: Jason Wang --- drivers/net/virtio_net.c | 90 +++- 1 file changed, 51 insertions(+), 39 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index e4ebf3ffbf02..079d4a213fda 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -45,9 +45,6 @@ module_param(napi_tx, bool, 0644); #define VIRTIO_XDP_TX BIT(0) #define VIRTIO_XDP_REDIR BIT(1) -#define VIRTIO_XDP_FLAGBIT(0) -#define VIRTIO_ORPHAN_FLAG BIT(1) - /* RX packet size EWMA. The average packet size is used to determine the packet * buffer size when refilling RX rings. As the entire RX ring may be refilled * at once, the weight is chosen so that the EWMA will be insensitive to short- @@ -509,34 +506,35 @@ static struct sk_buff *virtnet_skb_append_frag(struct sk_buff *head_skb, struct page *page, void *buf, int len, int truesize); -static bool is_xdp_frame(void *ptr) -{ - return (unsigned long)ptr & VIRTIO_XDP_FLAG; -} +enum virtnet_xmit_type { + VIRTNET_XMIT_TYPE_SKB, + VIRTNET_XMIT_TYPE_SKB_ORPHAN, + VIRTNET_XMIT_TYPE_XDP, +}; -static void *xdp_to_ptr(struct xdp_frame *ptr) -{ - return (void *)((unsigned long)ptr | VIRTIO_XDP_FLAG); -} +/* We use the last two bits of the pointer to distinguish the xmit type. */ +#define VIRTNET_XMIT_TYPE_MASK (BIT(0) | BIT(1)) -static struct xdp_frame *ptr_to_xdp(void *ptr) +static enum virtnet_xmit_type virtnet_xmit_ptr_unpack(void **ptr) { - return (struct xdp_frame *)((unsigned long)ptr & ~VIRTIO_XDP_FLAG); -} + unsigned long p = (unsigned long)*ptr; -static bool is_orphan_skb(void *ptr) -{ - return (unsigned long)ptr & VIRTIO_ORPHAN_FLAG; + *ptr = (void *)(p & ~VIRTNET_XMIT_TYPE_MASK); + + return p & VIRTNET_XMIT_TYPE_MASK; } -static void *skb_to_ptr(struct sk_buff *skb, bool orphan) +static void *virtnet_xmit_ptr_pack(void *ptr, enum virtnet_xmit_type type) { - return (void *)((unsigned long)skb | (orphan ? VIRTIO_ORPHAN_FLAG : 0)); + return (void *)((unsigned long)ptr | type); } -static struct sk_buff *ptr_to_skb(void *ptr) +static int virtnet_add_outbuf(struct send_queue *sq, int num, void *data, + enum virtnet_xmit_type type) { - return (struct sk_buff *)((unsigned long)ptr & ~VIRTIO_ORPHAN_FLAG); + return virtqueue_add_outbuf(sq->vq, sq->sg, num, + virtnet_xmit_ptr_pack(data, type), + GFP_ATOMIC); } static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len) @@ -548,29 +546,37 @@ static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len) static void __free_old_xmit(struct send_queue *sq, struct netdev_queue *txq, bool in_napi, struct virtnet_sq_free_stats *stats) { + struct xdp_frame *frame; + struct sk_buff *skb; unsigned int len; void *ptr; while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) { - if (!is_xdp_frame(ptr)) { - struct sk_buff *skb = ptr_to_skb(ptr); + switch (virtnet_xmit_ptr_unpack(&ptr)) { + case VIRTNET_XMIT_TYPE_SKB: + skb = ptr; pr_debug("Sent skb %p\n", skb); + stats->napi_packets++; + stats->napi_bytes += skb->len; + napi_consume_skb(skb, in_napi); + break; - if (is_orphan_skb(ptr)) { - stats->packets++; - stats->bytes += skb->len; - } else { - stats->napi_packets++; - stats->napi_bytes += skb->len; - } + case VIRTNET_XMIT_TYPE_SKB_ORPHAN: + skb = ptr; + + stats->packets++; + stats->bytes += skb->len; napi_consume_skb(skb, in_napi); - } else { - struct xdp_frame *frame = ptr_to_xdp(ptr); + break; + + case V
[PATCH net-next v3 06/13] virtio-net: rq submits premapped per-buffer
virtio-net rq submits premapped per-buffer by setting sg page to NULL; Signed-off-by: Xuan Zhuo --- drivers/net/virtio_net.c | 22 +++--- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 4b27ded8fc16..862beacef5d7 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -539,6 +539,12 @@ static struct sk_buff *ptr_to_skb(void *ptr) return (struct sk_buff *)((unsigned long)ptr & ~VIRTIO_ORPHAN_FLAG); } +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len) +{ + sg_dma_address(sg) = addr; + sg_dma_len(sg) = len; +} + static void __free_old_xmit(struct send_queue *sq, struct netdev_queue *txq, bool in_napi, struct virtnet_sq_free_stats *stats) { @@ -916,8 +922,7 @@ static void virtnet_rq_init_one_sg(struct receive_queue *rq, void *buf, u32 len) addr = dma->addr - sizeof(*dma) + offset; sg_init_table(rq->sg, 1); - rq->sg[0].dma_address = addr; - rq->sg[0].length = len; + sg_fill_dma(rq->sg, addr, len); } static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp) @@ -1067,12 +1072,6 @@ static void check_sq_full_and_disable(struct virtnet_info *vi, } } -static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len) -{ - sg->dma_address = addr; - sg->length = len; -} - static struct xdp_buff *buf_to_xdp(struct virtnet_info *vi, struct receive_queue *rq, void *buf, u32 len) { @@ -1353,7 +1352,8 @@ static int virtnet_add_recvbuf_xsk(struct virtnet_info *vi, struct receive_queue sg_init_table(rq->sg, 1); sg_fill_dma(rq->sg, addr, len); - err = virtqueue_add_inbuf(rq->vq, rq->sg, 1, xsk_buffs[i], gfp); + err = virtqueue_add_inbuf_premapped(rq->vq, rq->sg, 1, + xsk_buffs[i], NULL, gfp); if (err) goto err; } @@ -2433,7 +2433,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, virtnet_rq_init_one_sg(rq, buf, vi->hdr_len + GOOD_PACKET_LEN); - err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp); + err = virtqueue_add_inbuf_premapped(rq->vq, rq->sg, 1, buf, ctx, gfp); if (err < 0) { virtnet_rq_unmap(rq, buf, 0); put_page(virt_to_head_page(buf)); @@ -2553,7 +2553,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, virtnet_rq_init_one_sg(rq, buf, len); ctx = mergeable_len_to_ctx(len + room, headroom); - err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp); + err = virtqueue_add_inbuf_premapped(rq->vq, rq->sg, 1, buf, ctx, gfp); if (err < 0) { virtnet_rq_unmap(rq, buf, 0); put_page(virt_to_head_page(buf)); -- 2.32.0.3.g01195cf9f
[PATCH net-next v3 01/13] virtio_ring: introduce vring_need_unmap_buffer
To make the code readable, introduce vring_need_unmap_buffer() to replace do_unmap. use_dma_api premapped -> vring_need_unmap_buffer() 1. false falsefalse 2. truefalsetrue 3. truetrue false Signed-off-by: Xuan Zhuo Acked-by: Jason Wang --- drivers/virtio/virtio_ring.c | 27 --- 1 file changed, 12 insertions(+), 15 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 98374ed7c577..97590c201aa2 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -175,11 +175,6 @@ struct vring_virtqueue { /* Do DMA mapping by driver */ bool premapped; - /* Do unmap or not for desc. Just when premapped is False and -* use_dma_api is true, this is true. -*/ - bool do_unmap; - /* Head of free buffer list. */ unsigned int free_head; /* Number we've added since last sync. */ @@ -297,6 +292,11 @@ static bool vring_use_dma_api(const struct virtio_device *vdev) return false; } +static bool vring_need_unmap_buffer(const struct vring_virtqueue *vring) +{ + return vring->use_dma_api && !vring->premapped; +} + size_t virtio_max_dma_size(const struct virtio_device *vdev) { size_t max_segment_size = SIZE_MAX; @@ -445,7 +445,7 @@ static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq, { u16 flags; - if (!vq->do_unmap) + if (!vring_need_unmap_buffer(vq)) return; flags = virtio16_to_cpu(vq->vq.vdev, desc->flags); @@ -475,7 +475,7 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq, (flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : DMA_TO_DEVICE); } else { - if (!vq->do_unmap) + if (!vring_need_unmap_buffer(vq)) goto out; dma_unmap_page(vring_dma_dev(vq), @@ -643,7 +643,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq, } /* Last one doesn't continue. */ desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT); - if (!indirect && vq->do_unmap) + if (!indirect && vring_need_unmap_buffer(vq)) vq->split.desc_extra[prev & (vq->split.vring.num - 1)].flags &= ~VRING_DESC_F_NEXT; @@ -802,7 +802,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head, VRING_DESC_F_INDIRECT)); BUG_ON(len == 0 || len % sizeof(struct vring_desc)); - if (vq->do_unmap) { + if (vring_need_unmap_buffer(vq)) { for (j = 0; j < len / sizeof(struct vring_desc); j++) vring_unmap_one_split_indirect(vq, &indir_desc[j]); } @@ -1236,7 +1236,7 @@ static void vring_unmap_extra_packed(const struct vring_virtqueue *vq, (flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : DMA_TO_DEVICE); } else { - if (!vq->do_unmap) + if (!vring_need_unmap_buffer(vq)) return; dma_unmap_page(vring_dma_dev(vq), @@ -1251,7 +1251,7 @@ static void vring_unmap_desc_packed(const struct vring_virtqueue *vq, { u16 flags; - if (!vq->do_unmap) + if (!vring_need_unmap_buffer(vq)) return; flags = le16_to_cpu(desc->flags); @@ -1632,7 +1632,7 @@ static void detach_buf_packed(struct vring_virtqueue *vq, if (!desc) return; - if (vq->do_unmap) { + if (vring_need_unmap_buffer(vq)) { len = vq->packed.desc_extra[id].len; for (i = 0; i < len / sizeof(struct vring_packed_desc); i++) @@ -2091,7 +2091,6 @@ static struct virtqueue *vring_create_virtqueue_packed( vq->dma_dev = dma_dev; vq->use_dma_api = vring_use_dma_api(vdev); vq->premapped = false; - vq->do_unmap = vq->use_dma_api; vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) && !context; @@ -2636,7 +2635,6 @@ static struct virtqueue *__vring_new_virtqueue(unsigned int index, vq->dma_dev = dma_dev; vq->use_dma_api = vring_use_dma_api(vdev); vq->premapped = false; - vq->do_unmap = vq->use_dma_api; vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) && !context; @@ -2799,7 +2797,6 @@ int virtqueue_set_dma_premapped(struct virtqueue *_vq) } vq->premapped = true; - vq->do_unmap = false; END_USE(vq); -- 2.32.0.3.g01195cf9f
[PATCH net-next v3 05/13] virtio_ring: introduce add api for premapped
Two APIs are introduced to submit premapped per-buffers. int virtqueue_add_inbuf_premapped(struct virtqueue *vq, struct scatterlist *sg, unsigned int num, void *data, void *ctx, gfp_t gfp); int virtqueue_add_outbuf_premapped(struct virtqueue *vq, struct scatterlist *sg, unsigned int num, void *data, gfp_t gfp); Signed-off-by: Xuan Zhuo --- drivers/virtio/virtio_ring.c | 46 include/linux/virtio.h | 11 + 2 files changed, 57 insertions(+) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index fefa85a5e6b6..26d218f6235d 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -2276,6 +2276,28 @@ int virtqueue_add_outbuf(struct virtqueue *vq, } EXPORT_SYMBOL_GPL(virtqueue_add_outbuf); +/** + * virtqueue_add_outbuf_premapped - expose output buffers to other end + * @vq: the struct virtqueue we're talking about. + * @sg: scatterlist (must be well-formed and terminated!) + * @num: the number of entries in @sg readable by other side + * @data: the token identifying the buffer. + * @gfp: how to do memory allocations (if necessary). + * + * Caller must ensure we don't call this with other virtqueue operations + * at the same time (except where noted). + * + * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO). + */ +int virtqueue_add_outbuf_premapped(struct virtqueue *vq, + struct scatterlist *sg, unsigned int num, + void *data, + gfp_t gfp) +{ + return virtqueue_add(vq, &sg, num, 1, 0, data, NULL, true, gfp); +} +EXPORT_SYMBOL_GPL(virtqueue_add_outbuf_premapped); + /** * virtqueue_add_inbuf - expose input buffers to other end * @vq: the struct virtqueue we're talking about. @@ -2322,6 +2344,30 @@ int virtqueue_add_inbuf_ctx(struct virtqueue *vq, } EXPORT_SYMBOL_GPL(virtqueue_add_inbuf_ctx); +/** + * virtqueue_add_inbuf_premapped - expose input buffers to other end + * @vq: the struct virtqueue we're talking about. + * @sg: scatterlist (must be well-formed and terminated!) + * @num: the number of entries in @sg writable by other side + * @data: the token identifying the buffer. + * @ctx: extra context for the token + * @gfp: how to do memory allocations (if necessary). + * + * Caller must ensure we don't call this with other virtqueue operations + * at the same time (except where noted). + * + * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO). + */ +int virtqueue_add_inbuf_premapped(struct virtqueue *vq, + struct scatterlist *sg, unsigned int num, + void *data, + void *ctx, + gfp_t gfp) +{ + return virtqueue_add(vq, &sg, num, 0, 1, data, ctx, true, gfp); +} +EXPORT_SYMBOL_GPL(virtqueue_add_inbuf_premapped); + /** * virtqueue_dma_dev - get the dma dev * @_vq: the struct virtqueue we're talking about. diff --git a/include/linux/virtio.h b/include/linux/virtio.h index 306137a15d07..13b3f55abca3 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -56,6 +56,17 @@ int virtqueue_add_inbuf_ctx(struct virtqueue *vq, void *ctx, gfp_t gfp); +int virtqueue_add_inbuf_premapped(struct virtqueue *vq, + struct scatterlist *sg, unsigned int num, + void *data, + void *ctx, + gfp_t gfp); + +int virtqueue_add_outbuf_premapped(struct virtqueue *vq, + struct scatterlist *sg, unsigned int num, + void *data, + gfp_t gfp); + int virtqueue_add_sgs(struct virtqueue *vq, struct scatterlist *sgs[], unsigned int out_sgs, -- 2.32.0.3.g01195cf9f
[PATCH net-next v3 10/13] virtio_net: xsk: prevent disable tx napi
Since xsk's TX queue is consumed by TX NAPI, if sq is bound to xsk, then we must stop tx napi from being disabled. Signed-off-by: Xuan Zhuo Acked-by: Jason Wang --- drivers/net/virtio_net.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index a6becdc762cd..3fc8e71cfba9 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -5029,7 +5029,7 @@ static int virtnet_set_coalesce(struct net_device *dev, struct netlink_ext_ack *extack) { struct virtnet_info *vi = netdev_priv(dev); - int ret, queue_number, napi_weight; + int ret, queue_number, napi_weight, i; bool update_napi = false; /* Can't change NAPI weight if the link is up */ @@ -5058,6 +5058,14 @@ static int virtnet_set_coalesce(struct net_device *dev, return ret; if (update_napi) { + /* xsk xmit depends on the tx napi. So if xsk is active, +* prevent modifications to tx napi. +*/ + for (i = queue_number; i < vi->max_queue_pairs; i++) { + if (vi->sq[i].xsk_pool) + return -EBUSY; + } + for (; queue_number < vi->max_queue_pairs; queue_number++) vi->sq[queue_number].napi.weight = napi_weight; } -- 2.32.0.3.g01195cf9f
[PATCH net-next v3 09/13] virtio_net: xsk: bind/unbind xsk for tx
This patch implement the logic of bind/unbind xsk pool to sq and rq. Signed-off-by: Xuan Zhuo Acked-by: Jason Wang --- drivers/net/virtio_net.c | 53 1 file changed, 53 insertions(+) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 079d4a213fda..a6becdc762cd 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -295,6 +295,10 @@ struct send_queue { /* Record whether sq is in reset state. */ bool reset; + + struct xsk_buff_pool *xsk_pool; + + dma_addr_t xsk_hdr_dma_addr; }; /* Internal representation of a receive virtqueue */ @@ -494,6 +498,8 @@ struct virtio_net_common_hdr { }; }; +static struct virtio_net_common_hdr xsk_hdr; + static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf); static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct xdp_buff *xdp, struct net_device *dev, @@ -5500,6 +5506,29 @@ static int virtnet_rq_bind_xsk_pool(struct virtnet_info *vi, struct receive_queu return err; } +static int virtnet_sq_bind_xsk_pool(struct virtnet_info *vi, + struct send_queue *sq, + struct xsk_buff_pool *pool) +{ + int err, qindex; + + qindex = sq - vi->sq; + + virtnet_tx_pause(vi, sq); + + err = virtqueue_reset(sq->vq, virtnet_sq_free_unused_buf); + if (err) { + netdev_err(vi->dev, "reset tx fail: tx queue index: %d err: %d\n", qindex, err); + pool = NULL; + } + + sq->xsk_pool = pool; + + virtnet_tx_resume(vi, sq); + + return err; +} + static int virtnet_xsk_pool_enable(struct net_device *dev, struct xsk_buff_pool *pool, u16 qid) @@ -5508,6 +5537,7 @@ static int virtnet_xsk_pool_enable(struct net_device *dev, struct receive_queue *rq; struct device *dma_dev; struct send_queue *sq; + dma_addr_t hdr_dma; int err, size; if (vi->hdr_len > xsk_pool_get_headroom(pool)) @@ -5545,6 +5575,11 @@ static int virtnet_xsk_pool_enable(struct net_device *dev, if (!rq->xsk_buffs) return -ENOMEM; + hdr_dma = virtqueue_dma_map_single_attrs(sq->vq, &xsk_hdr, vi->hdr_len, +DMA_TO_DEVICE, 0); + if (virtqueue_dma_mapping_error(sq->vq, hdr_dma)) + return -ENOMEM; + err = xsk_pool_dma_map(pool, dma_dev, 0); if (err) goto err_xsk_map; @@ -5553,11 +5588,24 @@ static int virtnet_xsk_pool_enable(struct net_device *dev, if (err) goto err_rq; + err = virtnet_sq_bind_xsk_pool(vi, sq, pool); + if (err) + goto err_sq; + + /* Now, we do not support tx offload(such as tx csum), so all the tx +* virtnet hdr is zero. So all the tx packets can share a single hdr. +*/ + sq->xsk_hdr_dma_addr = hdr_dma; + return 0; +err_sq: + virtnet_rq_bind_xsk_pool(vi, rq, NULL); err_rq: xsk_pool_dma_unmap(pool, 0); err_xsk_map: + virtqueue_dma_unmap_single_attrs(rq->vq, hdr_dma, vi->hdr_len, +DMA_TO_DEVICE, 0); return err; } @@ -5566,19 +5614,24 @@ static int virtnet_xsk_pool_disable(struct net_device *dev, u16 qid) struct virtnet_info *vi = netdev_priv(dev); struct xsk_buff_pool *pool; struct receive_queue *rq; + struct send_queue *sq; int err; if (qid >= vi->curr_queue_pairs) return -EINVAL; + sq = &vi->sq[qid]; rq = &vi->rq[qid]; pool = rq->xsk_pool; err = virtnet_rq_bind_xsk_pool(vi, rq, NULL); + err |= virtnet_sq_bind_xsk_pool(vi, sq, NULL); xsk_pool_dma_unmap(pool, 0); + virtqueue_dma_unmap_single_attrs(sq->vq, sq->xsk_hdr_dma_addr, +vi->hdr_len, DMA_TO_DEVICE, 0); kvfree(rq->xsk_buffs); return err; -- 2.32.0.3.g01195cf9f
[PATCH net-next v3 11/13] virtio_net: xsk: tx: support xmit xsk buffer
The driver's tx napi is very important for XSK. It is responsible for obtaining data from the XSK queue and sending it out. At the beginning, we need to trigger tx napi. virtnet_free_old_xmit distinguishes three type ptr(skb, xdp frame, xsk buffer) by the last bits of the pointer. Signed-off-by: Xuan Zhuo Acked-by: Jason Wang --- drivers/net/virtio_net.c | 179 --- 1 file changed, 168 insertions(+), 11 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 3fc8e71cfba9..67d33cc913cc 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -83,6 +83,7 @@ struct virtnet_sq_free_stats { u64 bytes; u64 napi_packets; u64 napi_bytes; + u64 xsk; }; struct virtnet_sq_stats { @@ -511,16 +512,20 @@ static struct sk_buff *virtnet_skb_append_frag(struct sk_buff *head_skb, struct sk_buff *curr_skb, struct page *page, void *buf, int len, int truesize); +static void virtnet_xsk_completed(struct send_queue *sq, int num); enum virtnet_xmit_type { VIRTNET_XMIT_TYPE_SKB, VIRTNET_XMIT_TYPE_SKB_ORPHAN, VIRTNET_XMIT_TYPE_XDP, + VIRTNET_XMIT_TYPE_XSK, }; /* We use the last two bits of the pointer to distinguish the xmit type. */ #define VIRTNET_XMIT_TYPE_MASK (BIT(0) | BIT(1)) +#define VIRTIO_XSK_FLAG_OFFSET 2 + static enum virtnet_xmit_type virtnet_xmit_ptr_unpack(void **ptr) { unsigned long p = (unsigned long)*ptr; @@ -543,6 +548,11 @@ static int virtnet_add_outbuf(struct send_queue *sq, int num, void *data, GFP_ATOMIC); } +static u32 virtnet_ptr_to_xsk_buff_len(void *ptr) +{ + return ((unsigned long)ptr) >> VIRTIO_XSK_FLAG_OFFSET; +} + static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len) { sg_dma_address(sg) = addr; @@ -583,11 +593,27 @@ static void __free_old_xmit(struct send_queue *sq, struct netdev_queue *txq, stats->bytes += xdp_get_frame_len(frame); xdp_return_frame(frame); break; + + case VIRTNET_XMIT_TYPE_XSK: + stats->bytes += virtnet_ptr_to_xsk_buff_len(ptr); + stats->xsk++; + break; } } netdev_tx_completed_queue(txq, stats->napi_packets, stats->napi_bytes); } +static void virtnet_free_old_xmit(struct send_queue *sq, + struct netdev_queue *txq, + bool in_napi, + struct virtnet_sq_free_stats *stats) +{ + __free_old_xmit(sq, txq, in_napi, stats); + + if (stats->xsk) + virtnet_xsk_completed(sq, stats->xsk); +} + /* Converting between virtqueue no. and kernel tx/rx queue no. * 0:rx0 1:tx0 2:rx1 3:tx1 ... 2N:rxN 2N+1:txN 2N+2:cvq */ @@ -1017,7 +1043,7 @@ static void free_old_xmit(struct send_queue *sq, struct netdev_queue *txq, { struct virtnet_sq_free_stats stats = {0}; - __free_old_xmit(sq, txq, in_napi, &stats); + virtnet_free_old_xmit(sq, txq, in_napi, &stats); /* Avoid overhead when no packets have been processed * happens when called speculatively from start_xmit. @@ -1379,6 +1405,113 @@ static int virtnet_add_recvbuf_xsk(struct virtnet_info *vi, struct receive_queue return err; } +static void *virtnet_xsk_to_ptr(u32 len) +{ + unsigned long p; + + p = len << VIRTIO_XSK_FLAG_OFFSET; + + return virtnet_xmit_ptr_pack((void *)p, VIRTNET_XMIT_TYPE_XSK); +} + +static int virtnet_xsk_xmit_one(struct send_queue *sq, + struct xsk_buff_pool *pool, + struct xdp_desc *desc) +{ + struct virtnet_info *vi; + dma_addr_t addr; + + vi = sq->vq->vdev->priv; + + addr = xsk_buff_raw_get_dma(pool, desc->addr); + xsk_buff_raw_dma_sync_for_device(pool, addr, desc->len); + + sg_init_table(sq->sg, 2); + sg_fill_dma(sq->sg, sq->xsk_hdr_dma_addr, vi->hdr_len); + sg_fill_dma(sq->sg + 1, addr, desc->len); + + return virtqueue_add_outbuf_premapped(sq->vq, sq->sg, 2, + virtnet_xsk_to_ptr(desc->len), + GFP_ATOMIC); +} + +static int virtnet_xsk_xmit_batch(struct send_queue *sq, + struct xsk_buff_pool *pool, + unsigned int budget, + u64 *kicks) +{ + struct xdp_desc *descs = pool->tx_descs; + bool kick = false; + u32 nb_pkts, i; + int err; + + budget = min_t(u32, budget, sq->vq->num_free); + + nb_pkts = xsk_tx_peek_release_desc_batch(pool, budget); + if (
[PATCH net-next v3 12/13] virtio_net: update tx timeout record
If send queue sent some packets, we update the tx timeout record to prevent the tx timeout. Signed-off-by: Xuan Zhuo Acked-by: Jason Wang --- drivers/net/virtio_net.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 67d33cc913cc..7cd6f1d74710 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1489,6 +1489,13 @@ static bool virtnet_xsk_xmit(struct send_queue *sq, struct xsk_buff_pool *pool, if (!is_xdp_raw_buffer_queue(vi, sq - vi->sq)) check_sq_full_and_disable(vi, vi->dev, sq); + if (sent) { + struct netdev_queue *txq; + + txq = netdev_get_tx_queue(vi->dev, sq - vi->sq); + txq_trans_cond_update(txq); + } + u64_stats_update_begin(&sq->stats.syncp); u64_stats_add(&sq->stats.packets, stats.packets); u64_stats_add(&sq->stats.bytes, stats.bytes); -- 2.32.0.3.g01195cf9f
[PATCH net-next v3 13/13] virtio_net: xdp_features add NETDEV_XDP_ACT_XSK_ZEROCOPY
Now, we support AF_XDP(xsk). Add NETDEV_XDP_ACT_XSK_ZEROCOPY to xdp_features. Signed-off-by: Xuan Zhuo Acked-by: Jason Wang --- drivers/net/virtio_net.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 7cd6f1d74710..8438e210b00f 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -6618,7 +6618,8 @@ static int virtnet_probe(struct virtio_device *vdev) dev->hw_features |= NETIF_F_GRO_HW; dev->vlan_features = dev->features; - dev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT; + dev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_XSK_ZEROCOPY; /* MTU range: 68 - 65535 */ dev->min_mtu = MIN_MTU; -- 2.32.0.3.g01195cf9f
[PATCH net-next v3 00/13] virtio-net: support AF_XDP zero copy (tx)
v3: 1. use sg_dma_address/length api to set the premapped sg 2. remove 'premapped' parameter from the new APIs 3. tweak the comment of commit #2,#3 v2: 1. use new api to submit premapped buffer instead of using sgs to pass this info 2. some small fixes for http://lore.kernel.org/all/20240924013204.13763-1-xuanz...@linux.alibaba.com v1: 1. some small fixes for http://lore.kernel.org/all/20240820073330.9161-1-xuanz...@linux.alibaba.com 1. fix the title of the commit #2, #3 2. fix the gcc error for commit #3 3. use virtqueue_dma_ for tx hdr 4. rename virtnet_ptr_to_xsk to virtnet_ptr_to_xsk_buff_len 5. squash #11 in last patch set to #10 ## AF_XDP XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero copy feature of xsk (XDP socket) needs to be supported by the driver. The performance of zero copy is very good. mlx5 and intel ixgbe already support this feature, This patch set allows virtio-net to support xsk's zerocopy xmit feature. At present, we have completed some preparation: 1. vq-reset (virtio spec and kernel code) 2. virtio-core premapped dma 3. virtio-net xdp refactor So it is time for Virtio-Net to complete the support for the XDP Socket Zerocopy. Virtio-net can not increase the queue num at will, so xsk shares the queue with kernel. This patch set includes some refactor to the virtio-net to let that to support AF_XDP. ## About virtio premapped mode The current configuration sets the virtqueue (vq) to premapped mode, implying that all buffers submitted to this queue must be mapped ahead of time. This presents a challenge for the virtnet send queue (sq): the virtnet driver would be required to keep track of dma information for vq size * 17, which can be substantial. However, if the premapped mode were applied on a per-buffer basis, the complexity would be greatly reduced. With AF_XDP enabled, AF_XDP buffers would become premapped, while kernel skb buffers could remain unmapped. We can distinguish them by sg_page(sg), When sg_page(sg) is NULL, this indicates that the driver has performed DMA mapping in advance, allowing the Virtio core to directly utilize sg_dma_address(sg) without conducting any internal DMA mapping. Additionally, DMA unmap operations for this buffer will be bypassed. ## performance ENV: Qemu with vhost-user(polling mode). Host CPU: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz ### virtio PMD in guest with testpmd testpmd> show port stats all NIC statistics for port 0 RX-packets: 19531092064 RX-missed: 0 RX-bytes: 1093741155584 RX-errors: 0 RX-nombuf: 0 TX-packets: 595992 TX-errors: 0 TX-bytes: 371030645664 Throughput (since last show) Rx-pps: 8861574 Rx-bps: 3969985208 Tx-pps: 8861493 Tx-bps: 3969962736 ### AF_XDP PMD in guest with testpmd testpmd> show port stats all NIC statistics for port 0 RX-packets: 68152727 RX-missed: 0 RX-bytes: 3816552712 RX-errors: 0 RX-nombuf: 0 TX-packets: 68114967 TX-errors: 33216 TX-bytes: 3814438152 Throughput (since last show) Rx-pps: 6333196 Rx-bps: 2837272088 Tx-pps: 6333227 Tx-bps: 2837285936 But AF_XDP consumes more CPU for tx and rx napi(100% and 86%). Please review. Thanks. Xuan Zhuo (13): virtio_ring: introduce vring_need_unmap_buffer virtio_ring: split: record extras for indirect buffers virtio_ring: packed: record extras for indirect buffers virtio_ring: perform premapped operations based on per-buffer virtio_ring: introduce add api for premapped virtio-net: rq submits premapped per-buffer virtio_ring: remove API virtqueue_set_dma_premapped virtio_net: refactor the xmit type virtio_net: xsk: bind/unbind xsk for tx virtio_net: xsk: prevent disable tx napi virtio_net: xsk: tx: support xmit xsk buffer virtio_net: update tx timeout record virtio_net: xdp_features add NETDEV_XDP_ACT_XSK_ZEROCOPY drivers/net/virtio_net.c | 369 --- drivers/virtio/virtio_ring.c | 354 - include/linux/virtio.h | 13 +- 3 files changed, 487 insertions(+), 249 deletions(-) -- 2.32.0.3.g01195cf9f
[PATCH net-next v3 03/13] virtio_ring: packed: record extras for indirect buffers
The subsequent commit needs to know whether every indirect buffer is premapped or not. So we need to introduce an extra struct for every indirect buffer to record this info. Signed-off-by: Xuan Zhuo --- drivers/virtio/virtio_ring.c | 60 +--- 1 file changed, 36 insertions(+), 24 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 405d5a348795..cfe70c40f630 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -78,7 +78,11 @@ struct vring_desc_state_split { struct vring_desc_state_packed { void *data; /* Data for callback. */ - struct vring_packed_desc *indir_desc; /* Indirect descriptor, if any. */ + + /* Indirect desc table and extra table, if any. These two will be +* allocated together. So we won't stress more to the memory allocator. +*/ + struct vring_packed_desc *indir_desc; u16 num;/* Descriptor list length. */ u16 last; /* The last desc state in a list. */ }; @@ -1238,27 +1242,12 @@ static void vring_unmap_extra_packed(const struct vring_virtqueue *vq, } } -static void vring_unmap_desc_packed(const struct vring_virtqueue *vq, - const struct vring_packed_desc *desc) -{ - u16 flags; - - if (!vring_need_unmap_buffer(vq)) - return; - - flags = le16_to_cpu(desc->flags); - - dma_unmap_page(vring_dma_dev(vq), - le64_to_cpu(desc->addr), - le32_to_cpu(desc->len), - (flags & VRING_DESC_F_WRITE) ? - DMA_FROM_DEVICE : DMA_TO_DEVICE); -} - static struct vring_packed_desc *alloc_indirect_packed(unsigned int total_sg, gfp_t gfp) { + struct vring_desc_extra *extra; struct vring_packed_desc *desc; + int i, size; /* * We require lowmem mappings for the descriptors because @@ -1267,7 +1256,16 @@ static struct vring_packed_desc *alloc_indirect_packed(unsigned int total_sg, */ gfp &= ~__GFP_HIGHMEM; - desc = kmalloc_array(total_sg, sizeof(struct vring_packed_desc), gfp); + size = (sizeof(*desc) + sizeof(*extra)) * total_sg; + + desc = kmalloc(size, gfp); + if (!desc) + return NULL; + + extra = (struct vring_desc_extra *)&desc[total_sg]; + + for (i = 0; i < total_sg; i++) + extra[i].next = i + 1; return desc; } @@ -1280,6 +1278,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq, void *data, gfp_t gfp) { + struct vring_desc_extra *extra; struct vring_packed_desc *desc; struct scatterlist *sg; unsigned int i, n, err_idx; @@ -1291,6 +1290,8 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq, if (!desc) return -ENOMEM; + extra = (struct vring_desc_extra *)&desc[total_sg]; + if (unlikely(vq->vq.num_free < 1)) { pr_debug("Can't add buf len 1 - avail = 0\n"); kfree(desc); @@ -1312,6 +1313,13 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq, 0 : VRING_DESC_F_WRITE); desc[i].addr = cpu_to_le64(addr); desc[i].len = cpu_to_le32(sg->length); + + if (unlikely(vq->use_dma_api)) { + extra[i].addr = addr; + extra[i].len = sg->length; + extra[i].flags = n < out_sgs ? 0 : VRING_DESC_F_WRITE; + } + i++; } } @@ -1381,7 +1389,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq, err_idx = i; for (i = 0; i < err_idx; i++) - vring_unmap_desc_packed(vq, &desc[i]); + vring_unmap_extra_packed(vq, &extra[i]); free_desc: kfree(desc); @@ -1617,7 +1625,8 @@ static void detach_buf_packed(struct vring_virtqueue *vq, } if (vq->indirect) { - u32 len; + struct vring_desc_extra *extra; + u32 len, num; /* Free the indirect table, if any, now that it's unmapped. */ desc = state->indir_desc; @@ -1626,9 +1635,12 @@ static void detach_buf_packed(struct vring_virtqueue *vq, if (vring_need_unmap_buffer(vq)) { len = vq->packed.desc_extra[id].len; - for (i = 0; i < len / sizeof(struct vring_packed_desc); - i++) - vring_unmap_desc_packed(vq, &desc[i]); +
[PATCH net-next v3 07/13] virtio_ring: remove API virtqueue_set_dma_premapped
Now, this API is useless. remove it. Signed-off-by: Xuan Zhuo Acked-by: Jason Wang --- drivers/net/virtio_net.c | 13 -- drivers/virtio/virtio_ring.c | 48 include/linux/virtio.h | 2 -- 3 files changed, 63 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 862beacef5d7..e4ebf3ffbf02 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -6107,15 +6107,6 @@ static int virtnet_alloc_queues(struct virtnet_info *vi) return -ENOMEM; } -static void virtnet_rq_set_premapped(struct virtnet_info *vi) -{ - int i; - - for (i = 0; i < vi->max_queue_pairs; i++) - /* error should never happen */ - BUG_ON(virtqueue_set_dma_premapped(vi->rq[i].vq)); -} - static int init_vqs(struct virtnet_info *vi) { int ret; @@ -6129,10 +6120,6 @@ static int init_vqs(struct virtnet_info *vi) if (ret) goto err_free; - /* disable for big mode */ - if (!vi->big_packets || vi->mergeable_rx_bufs) - virtnet_rq_set_premapped(vi); - cpus_read_lock(); virtnet_set_affinity(vi); cpus_read_unlock(); diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 26d218f6235d..38bea08f0469 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -180,9 +180,6 @@ struct vring_virtqueue { /* Host publishes avail event idx */ bool event; - /* Do DMA mapping by driver */ - bool premapped; - /* Head of free buffer list. */ unsigned int free_head; /* Number we've added since last sync. */ @@ -2098,7 +2095,6 @@ static struct virtqueue *vring_create_virtqueue_packed( vq->packed_ring = true; vq->dma_dev = dma_dev; vq->use_dma_api = vring_use_dma_api(vdev); - vq->premapped = false; vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) && !context; @@ -2689,7 +2685,6 @@ static struct virtqueue *__vring_new_virtqueue(unsigned int index, #endif vq->dma_dev = dma_dev; vq->use_dma_api = vring_use_dma_api(vdev); - vq->premapped = false; vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) && !context; @@ -2816,49 +2811,6 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num, } EXPORT_SYMBOL_GPL(virtqueue_resize); -/** - * virtqueue_set_dma_premapped - set the vring premapped mode - * @_vq: the struct virtqueue we're talking about. - * - * Enable the premapped mode of the vq. - * - * The vring in premapped mode does not do dma internally, so the driver must - * do dma mapping in advance. The driver must pass the dma_address through - * dma_address of scatterlist. When the driver got a used buffer from - * the vring, it has to unmap the dma address. - * - * This function must be called immediately after creating the vq, or after vq - * reset, and before adding any buffers to it. - * - * Caller must ensure we don't call this with other virtqueue operations - * at the same time (except where noted). - * - * Returns zero or a negative error. - * 0: success. - * -EINVAL: too late to enable premapped mode, the vq already contains buffers. - */ -int virtqueue_set_dma_premapped(struct virtqueue *_vq) -{ - struct vring_virtqueue *vq = to_vvq(_vq); - u32 num; - - START_USE(vq); - - num = vq->packed_ring ? vq->packed.vring.num : vq->split.vring.num; - - if (num != vq->vq.num_free) { - END_USE(vq); - return -EINVAL; - } - - vq->premapped = true; - - END_USE(vq); - - return 0; -} -EXPORT_SYMBOL_GPL(virtqueue_set_dma_premapped); - /** * virtqueue_reset - detach and recycle all unused buffers * @_vq: the struct virtqueue we're talking about. diff --git a/include/linux/virtio.h b/include/linux/virtio.h index 13b3f55abca3..338e0f5efb4b 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -93,8 +93,6 @@ bool virtqueue_enable_cb(struct virtqueue *vq); unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq); -int virtqueue_set_dma_premapped(struct virtqueue *_vq); - bool virtqueue_poll(struct virtqueue *vq, unsigned); bool virtqueue_enable_cb_delayed(struct virtqueue *vq); -- 2.32.0.3.g01195cf9f