Re: [PATCH net-next v2 02/13] virtio_ring: split: record extras for indirect buffers

2024-11-05 Thread Xuan Zhuo
On Tue, 5 Nov 2024 11:42:09 +0800, Jason Wang  wrote:
> On Wed, Oct 30, 2024 at 4:25 PM Xuan Zhuo  wrote:
> >
> > The subsequent commit needs to know whether every indirect buffer is
> > premapped or not. So we need to introduce an extra struct for every
> > indirect buffer to record this info.
> >
> > Signed-off-by: Xuan Zhuo 
> > ---
> >  drivers/virtio/virtio_ring.c | 112 ---
> >  1 file changed, 52 insertions(+), 60 deletions(-)
>
> Do we have a performance impact for this patch?
>
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 97590c201aa2..dca093744fe1 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -69,7 +69,11 @@
> >
> >  struct vring_desc_state_split {
> > void *data; /* Data for callback. */
> > -   struct vring_desc *indir_desc;  /* Indirect descriptor, if any. */
> > +
> > +   /* Indirect extra table and desc table, if any. These two will be
> > +* allocated together. So we won't stress more to the memory 
> > allocator.
> > +*/
> > +   struct vring_desc *indir_desc;
>
> So it looks like we put a descriptor table after the extra table. Can
> this lead to more crossing page mappings for the indirect descriptors?
>
> If yes, it seems expensive so we probably need to make the descriptor
> table come first.

No, the descriptors are before extra table.
So, there is not performance impact.


>
> >  };
> >
> >  struct vring_desc_state_packed {
> > @@ -440,38 +444,20 @@ static void virtqueue_init(struct vring_virtqueue 
> > *vq, u32 num)
> >   * Split ring specific functions - *_split().
> >   */
> >
> > -static void vring_unmap_one_split_indirect(const struct vring_virtqueue 
> > *vq,
> > -  const struct vring_desc *desc)
> > -{
> > -   u16 flags;
> > -
> > -   if (!vring_need_unmap_buffer(vq))
> > -   return;
> > -
> > -   flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
> > -
> > -   dma_unmap_page(vring_dma_dev(vq),
> > -  virtio64_to_cpu(vq->vq.vdev, desc->addr),
> > -  virtio32_to_cpu(vq->vq.vdev, desc->len),
> > -  (flags & VRING_DESC_F_WRITE) ?
> > -  DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > -}
> > -
> >  static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > - unsigned int i)
> > + struct vring_desc_extra *extra)
> >  {
> > -   struct vring_desc_extra *extra = vq->split.desc_extra;
> > u16 flags;
> >
> > -   flags = extra[i].flags;
> > +   flags = extra->flags;
> >
> > if (flags & VRING_DESC_F_INDIRECT) {
> > if (!vq->use_dma_api)
> > goto out;
> >
> > dma_unmap_single(vring_dma_dev(vq),
> > -extra[i].addr,
> > -extra[i].len,
> > +extra->addr,
> > +extra->len,
> >  (flags & VRING_DESC_F_WRITE) ?
> >  DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > } else {
> > @@ -479,22 +465,23 @@ static unsigned int vring_unmap_one_split(const 
> > struct vring_virtqueue *vq,
> > goto out;
> >
> > dma_unmap_page(vring_dma_dev(vq),
> > -  extra[i].addr,
> > -  extra[i].len,
> > +  extra->addr,
> > +  extra->len,
> >(flags & VRING_DESC_F_WRITE) ?
> >DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > }
> >
> >  out:
> > -   return extra[i].next;
> > +   return extra->next;
> >  }
> >
> >  static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
> >unsigned int total_sg,
> >gfp_t gfp)
> >  {
> > +   struct vring_desc_extra *extra;
> > struct vring_desc *desc;
> > -   unsigned int i;
> > +   unsigned int i, size;
> >
> > /*
> >  * We require lowmem mappings for the descriptors because
> > @@ -503,40 +490,41 @@ static struct vring_desc *alloc_indirect_split(struct 
> > virtqueue *_vq,
> >  */
> > gfp &= ~__GFP_HIGHMEM;
> >
> > -   desc = kmalloc_array(total_sg, sizeof(struct vring_desc), gfp);
> > +   size = sizeof(*desc) * total_sg + sizeof(*extra) * total_sg;
> > +
> > +   desc = kmalloc(size, gfp);
> > if (!desc)
> > return NULL;
> >
> > +   extra = (struct vring_desc_extra *)&desc[total_sg];
> > +
> > for (i = 0; i < total_sg; i++)
> > -   desc[i].next = cpu_to_virtio16(_vq->vdev, i + 1);
>

Re: [PATCH net-next v1 1/4] virtio-net: fix overflow inside virtnet_rq_alloc

2024-11-05 Thread Jason Wang
On Tue, Oct 29, 2024 at 4:47 PM Xuan Zhuo  wrote:
>
> When the frag just got a page, then may lead to regression on VM.
> Specially if the sysctl net.core.high_order_alloc_disable value is 1,
> then the frag always get a page when do refill.
>
> Which could see reliable crashes or scp failure (scp a file 100M in size
> to VM).
>
> The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> of a new frag. When the frag size is larger than PAGE_SIZE,
> everything is fine. However, if the frag is only one page and the
> total size of the buffer and virtnet_rq_dma is larger than one page, an
> overflow may occur.
>
> The commit f9dac92ba908 ("virtio_ring: enable premapped mode whatever
> use_dma_api") introduced this problem. And we reverted some commits to
> fix this in last linux version. Now we try to enable it and fix this
> bug directly.
>
> Here, when the frag size is not enough, we reduce the buffer len to fix
> this problem.
>
> Reported-by: "Si-Wei Liu" 
> Tested-by: Darren Kenny 
> Signed-off-by: Xuan Zhuo 

Acked-by: Jason Wang 

Thanks




Re: [PATCH net-next v1 4/4] virtio_net: rx remove premapped failover code

2024-11-05 Thread Jason Wang
On Tue, Oct 29, 2024 at 4:47 PM Xuan Zhuo  wrote:
>
> Now, the premapped mode can be enabled unconditionally.
>
> So we can remove the failover code for merge and small mode.
>
> The virtnet_rq_xxx() helper would be only used if the mode is using pre
> mapping. A check is added to prevent misusing of these API.
>
> Tested-by: Darren Kenny 
> Signed-off-by: Xuan Zhuo 

Acked-by: Jason Wang 

Thanks




Re: [PATCH net-next v1 0/4] virtio_net: enable premapped mode by default

2024-11-05 Thread patchwork-bot+netdevbpf
Hello:

This series was applied to netdev/net-next.git (main)
by Paolo Abeni :

On Tue, 29 Oct 2024 16:46:11 +0800 you wrote:
> v1:
> 1. fix some small problems
> 2. remove commit "virtio_net: introduce vi->mode"
> 
> In the last linux version, we disabled this feature to fix the
> regress[1].
> 
> [...]

Here is the summary with links:
  - [net-next,v1,1/4] virtio-net: fix overflow inside virtnet_rq_alloc
https://git.kernel.org/netdev/net-next/c/6aacd1484468
  - [net-next,v1,2/4] virtio_net: big mode skip the unmap check
https://git.kernel.org/netdev/net-next/c/a33f3df85075
  - [net-next,v1,3/4] virtio_net: enable premapped mode for merge and small by 
default
https://git.kernel.org/netdev/net-next/c/47008bb51c3e
  - [net-next,v1,4/4] virtio_net: rx remove premapped failover code
https://git.kernel.org/netdev/net-next/c/fb22437c1ba3

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html





Re: [PATCH net-next v2 02/13] virtio_ring: split: record extras for indirect buffers

2024-11-05 Thread Jason Wang
On Tue, Nov 5, 2024 at 2:53 PM Xuan Zhuo  wrote:
>
> On Tue, 5 Nov 2024 11:42:09 +0800, Jason Wang  wrote:
> > On Wed, Oct 30, 2024 at 4:25 PM Xuan Zhuo  
> > wrote:
> > >
> > > The subsequent commit needs to know whether every indirect buffer is
> > > premapped or not. So we need to introduce an extra struct for every
> > > indirect buffer to record this info.
> > >
> > > Signed-off-by: Xuan Zhuo 
> > > ---
> > >  drivers/virtio/virtio_ring.c | 112 ---
> > >  1 file changed, 52 insertions(+), 60 deletions(-)
> >
> > Do we have a performance impact for this patch?
> >
> > >
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index 97590c201aa2..dca093744fe1 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -69,7 +69,11 @@
> > >
> > >  struct vring_desc_state_split {
> > > void *data; /* Data for callback. */
> > > -   struct vring_desc *indir_desc;  /* Indirect descriptor, if any. */
> > > +
> > > +   /* Indirect extra table and desc table, if any. These two will be
> > > +* allocated together. So we won't stress more to the memory 
> > > allocator.
> > > +*/
> > > +   struct vring_desc *indir_desc;
> >
> > So it looks like we put a descriptor table after the extra table. Can
> > this lead to more crossing page mappings for the indirect descriptors?
> >
> > If yes, it seems expensive so we probably need to make the descriptor
> > table come first.
>
> No, the descriptors are before extra table.

Well, you need then tweak the above comment, it said

"Indirect extra table and desc table".

> So, there is not performance impact.
>
>
> >
> > >  };
> > >

[...]

> > > while (vq->split.vring.desc[i].flags & nextflag) {
> > > -   vring_unmap_one_split(vq, i);
> > > +   vring_unmap_one_split(vq, &extra[i]);
> >
> > Not sure if I've asked this before. But this part seems to deserve an
> > independent fix for -stable.
>
> What fix?

I meant for hardening we need to check the flags stored in the extra
instead of the descriptor itself as it could be mangled by the device.

Thanks




Re: [PATCH net-next v2 06/13] virtio-net: rq submits premapped per-buffer

2024-11-05 Thread Jason Wang
On Tue, Nov 5, 2024 at 3:23 PM Xuan Zhuo  wrote:
>
> On Tue, 5 Nov 2024 11:23:50 +0800, Jason Wang  wrote:
> > On Wed, Oct 30, 2024 at 4:25 PM Xuan Zhuo  
> > wrote:
> > >
> > > virtio-net rq submits premapped per-buffer by setting sg page to NULL;
> > >
> > > Signed-off-by: Xuan Zhuo 
> > > ---
> > >  drivers/net/virtio_net.c | 24 +---
> > >  1 file changed, 13 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 792e9eadbfc3..09757fa408bd 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -542,6 +542,12 @@ static struct sk_buff *ptr_to_skb(void *ptr)
> > > return (struct sk_buff *)((unsigned long)ptr & 
> > > ~VIRTIO_ORPHAN_FLAG);
> > >  }
> > >
> > > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > +{
> > > +   sg->dma_address = addr;
> > > +   sg->length = len;
> >
> > This may work but I think it's better to reuse existing dma sg helpers like:
> >
> > sg_dma_address(sg) = addr;
> > sg_dma_length(sg) = len;
> >
> > And we probably need to fix the virtio core which only uses
> > sg_dma_address() but not sg_dma_length().
> >
> > This helps us to avoid future issues when CONFIG_NEED_SG_DMA_LENGTH is set.
>
>
> I don't think so.
>
> For no-premapped mode, we pass the sg as no-dma sg to virtio core,
> so the virtio core uses the sg->length directly.

This is fine.

> If virtio core do dma map for sg, we do not use the dma_mag_sg_attrs(),
> so we must use sg->length directly.

I meant it's a hack. It may work now but will be a bug in the future.

For example, I'm playing a prototype to do pre mapping for virtio-blk,
the idea is to move the expensive DMA mappings in the case of swiotlb
etc to be done outside the pre virtqueue lock. In that case, the
driver may want to use dma_map_sg() instead of dma_map_page().

I'd suppose we will finally go with the way where DMA mappings needs
to be handled by the driver, and dma_map_sg() is faster than per sg
dma_map_page() anyhow.

>
> In this case, for the driver, we can not use sg_dma_length(),
> if CONFIG_NEED_SG_DMA_LENGTH is set, sg_dma_length() will set sg->dma_length,
> but virtio core use sg->length.

Well, we just need a minor tweak to get the length from
vring_map_one_sg(), then everything should be fine?

if (sg_is_premapped) {
  *addr = sg_dma_address(sg);
  *len = sg_dma_len(sg);
}

>
> For sg->dma_address, it is ok for me to use sg_dma_address or not.
> But for consistency to sg->length, I use the sg->dma_address directly.
>
> I noticed this is special, so I put them into an independent function.
>
> Thanks.

Actually, the code like sg_fill_dma() calls for a virtqueue dma
mapping helper, I think we've agreed that core needs to hide DMA
details from the driver.  That is something like
virtqueue_dma_map_sg() etc.

Thanks

>
> >
> > Others look good.
> >
> > Thanks
> >
> > > +}
> > > +
> > >  static void __free_old_xmit(struct send_queue *sq, struct netdev_queue 
> > > *txq,
> > > bool in_napi, struct virtnet_sq_free_stats 
> > > *stats)
> > >  {
> > > @@ -915,8 +921,7 @@ static void virtnet_rq_init_one_sg(struct 
> > > receive_queue *rq, void *buf, u32 len)
> > > addr = dma->addr - sizeof(*dma) + offset;
> > >
> > > sg_init_table(rq->sg, 1);
> > > -   rq->sg[0].dma_address = addr;
> > > -   rq->sg[0].length = len;
> > > +   sg_fill_dma(rq->sg, addr, len);
> > >  }
> > >
> > >  static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t 
> > > gfp)
> > > @@ -1068,12 +1073,6 @@ static void check_sq_full_and_disable(struct 
> > > virtnet_info *vi,
> > > }
> > >  }
> > >
> > > -static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> > > -{
> > > -   sg->dma_address = addr;
> > > -   sg->length = len;
> > > -}
> > > -
> > >  static struct xdp_buff *buf_to_xdp(struct virtnet_info *vi,
> > >struct receive_queue *rq, void *buf, 
> > > u32 len)
> > >  {
> > > @@ -1354,7 +1353,8 @@ static int virtnet_add_recvbuf_xsk(struct 
> > > virtnet_info *vi, struct receive_queue
> > > sg_init_table(rq->sg, 1);
> > > sg_fill_dma(rq->sg, addr, len);
> > >
> > > -   err = virtqueue_add_inbuf(rq->vq, rq->sg, 1, 
> > > xsk_buffs[i], gfp);
> > > +   err = virtqueue_add_inbuf_premapped(rq->vq, rq->sg, 1, 
> > > xsk_buffs[i],
> > > +   NULL, true, gfp);
> > > if (err)
> > > goto err;
> > > }
> > > @@ -2431,7 +2431,8 @@ static int add_recvbuf_small(struct virtnet_info 
> > > *vi, struct receive_queue *rq,
> > >
> > > virtnet_rq_init_one_sg(rq, buf, vi->hdr_len + GOOD_PACKET_LEN);
> > >
> > > -   err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > +   err = virtqueue_add_inbuf_premapped(rq-

Re: [PATCH net-next v2 02/13] virtio_ring: split: record extras for indirect buffers

2024-11-05 Thread Michael S. Tsirkin
On Wed, Nov 06, 2024 at 09:44:39AM +0800, Jason Wang wrote:
> > > > while (vq->split.vring.desc[i].flags & nextflag) {
> > > > -   vring_unmap_one_split(vq, i);
> > > > +   vring_unmap_one_split(vq, &extra[i]);
> > >
> > > Not sure if I've asked this before. But this part seems to deserve an
> > > independent fix for -stable.
> >
> > What fix?
> 
> I meant for hardening we need to check the flags stored in the extra
> instead of the descriptor itself as it could be mangled by the device.
> 
> Thanks

Good point. Jason, want to cook up a patch?

-- 
MST




Re: [PATCH net-next v2 06/13] virtio-net: rq submits premapped per-buffer

2024-11-05 Thread Xuan Zhuo
On Wed, 6 Nov 2024 09:56:55 +0800, Jason Wang  wrote:
> On Tue, Nov 5, 2024 at 3:23 PM Xuan Zhuo  wrote:
> >
> > On Tue, 5 Nov 2024 11:23:50 +0800, Jason Wang  wrote:
> > > On Wed, Oct 30, 2024 at 4:25 PM Xuan Zhuo  
> > > wrote:
> > > >
> > > > virtio-net rq submits premapped per-buffer by setting sg page to NULL;
> > > >
> > > > Signed-off-by: Xuan Zhuo 
> > > > ---
> > > >  drivers/net/virtio_net.c | 24 +---
> > > >  1 file changed, 13 insertions(+), 11 deletions(-)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index 792e9eadbfc3..09757fa408bd 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -542,6 +542,12 @@ static struct sk_buff *ptr_to_skb(void *ptr)
> > > > return (struct sk_buff *)((unsigned long)ptr & 
> > > > ~VIRTIO_ORPHAN_FLAG);
> > > >  }
> > > >
> > > > +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 
> > > > len)
> > > > +{
> > > > +   sg->dma_address = addr;
> > > > +   sg->length = len;
> > >
> > > This may work but I think it's better to reuse existing dma sg helpers 
> > > like:
> > >
> > > sg_dma_address(sg) = addr;
> > > sg_dma_length(sg) = len;
> > >
> > > And we probably need to fix the virtio core which only uses
> > > sg_dma_address() but not sg_dma_length().
> > >
> > > This helps us to avoid future issues when CONFIG_NEED_SG_DMA_LENGTH is 
> > > set.
> >
> >
> > I don't think so.
> >
> > For no-premapped mode, we pass the sg as no-dma sg to virtio core,
> > so the virtio core uses the sg->length directly.
>
> This is fine.
>
> > If virtio core do dma map for sg, we do not use the dma_mag_sg_attrs(),
> > so we must use sg->length directly.
>
> I meant it's a hack. It may work now but will be a bug in the future.
>
> For example, I'm playing a prototype to do pre mapping for virtio-blk,
> the idea is to move the expensive DMA mappings in the case of swiotlb
> etc to be done outside the pre virtqueue lock. In that case, the
> driver may want to use dma_map_sg() instead of dma_map_page().
>
> I'd suppose we will finally go with the way where DMA mappings needs
> to be handled by the driver, and dma_map_sg() is faster than per sg
> dma_map_page() anyhow.
>
> >
> > In this case, for the driver, we can not use sg_dma_length(),
> > if CONFIG_NEED_SG_DMA_LENGTH is set, sg_dma_length() will set 
> > sg->dma_length,
> > but virtio core use sg->length.
>
> Well, we just need a minor tweak to get the length from
> vring_map_one_sg(), then everything should be fine?
>
> if (sg_is_premapped) {
>   *addr = sg_dma_address(sg);
>   *len = sg_dma_len(sg);
> }

For now, let us start from:

 if (sg_is_premapped) {
   *addr = sg_dma_address(sg);
   sg->length = sg_dma_len(sg);
 }

Then virtio core needs to be refactor to use dma_map_sg() in future.

Thanks.


>
> >
> > For sg->dma_address, it is ok for me to use sg_dma_address or not.
> > But for consistency to sg->length, I use the sg->dma_address directly.
> >
> > I noticed this is special, so I put them into an independent function.
> >
> > Thanks.
>
> Actually, the code like sg_fill_dma() calls for a virtqueue dma
> mapping helper, I think we've agreed that core needs to hide DMA
> details from the driver.  That is something like
> virtqueue_dma_map_sg() etc.
>
> Thanks
>
> >
> > >
> > > Others look good.
> > >
> > > Thanks
> > >
> > > > +}
> > > > +
> > > >  static void __free_old_xmit(struct send_queue *sq, struct netdev_queue 
> > > > *txq,
> > > > bool in_napi, struct virtnet_sq_free_stats 
> > > > *stats)
> > > >  {
> > > > @@ -915,8 +921,7 @@ static void virtnet_rq_init_one_sg(struct 
> > > > receive_queue *rq, void *buf, u32 len)
> > > > addr = dma->addr - sizeof(*dma) + offset;
> > > >
> > > > sg_init_table(rq->sg, 1);
> > > > -   rq->sg[0].dma_address = addr;
> > > > -   rq->sg[0].length = len;
> > > > +   sg_fill_dma(rq->sg, addr, len);
> > > >  }
> > > >
> > > >  static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, 
> > > > gfp_t gfp)
> > > > @@ -1068,12 +1073,6 @@ static void check_sq_full_and_disable(struct 
> > > > virtnet_info *vi,
> > > > }
> > > >  }
> > > >
> > > > -static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 
> > > > len)
> > > > -{
> > > > -   sg->dma_address = addr;
> > > > -   sg->length = len;
> > > > -}
> > > > -
> > > >  static struct xdp_buff *buf_to_xdp(struct virtnet_info *vi,
> > > >struct receive_queue *rq, void *buf, 
> > > > u32 len)
> > > >  {
> > > > @@ -1354,7 +1353,8 @@ static int virtnet_add_recvbuf_xsk(struct 
> > > > virtnet_info *vi, struct receive_queue
> > > > sg_init_table(rq->sg, 1);
> > > > sg_fill_dma(rq->sg, addr, len);
> > > >
> > > > -   err = virtqueue_add_inbuf(rq->vq, rq->sg, 1, 
> > > > xsk_buffs[i], gfp);
> > > > +   err = virtqueue_ad

Re: [PATCH net-next v2 02/13] virtio_ring: split: record extras for indirect buffers

2024-11-05 Thread Xuan Zhuo
On Wed, 6 Nov 2024 09:44:39 +0800, Jason Wang  wrote:
> On Tue, Nov 5, 2024 at 2:53 PM Xuan Zhuo  wrote:
> >
> > On Tue, 5 Nov 2024 11:42:09 +0800, Jason Wang  wrote:
> > > On Wed, Oct 30, 2024 at 4:25 PM Xuan Zhuo  
> > > wrote:
> > > >
> > > > The subsequent commit needs to know whether every indirect buffer is
> > > > premapped or not. So we need to introduce an extra struct for every
> > > > indirect buffer to record this info.
> > > >
> > > > Signed-off-by: Xuan Zhuo 
> > > > ---
> > > >  drivers/virtio/virtio_ring.c | 112 ---
> > > >  1 file changed, 52 insertions(+), 60 deletions(-)
> > >
> > > Do we have a performance impact for this patch?
> > >
> > > >
> > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > index 97590c201aa2..dca093744fe1 100644
> > > > --- a/drivers/virtio/virtio_ring.c
> > > > +++ b/drivers/virtio/virtio_ring.c
> > > > @@ -69,7 +69,11 @@
> > > >
> > > >  struct vring_desc_state_split {
> > > > void *data; /* Data for callback. */
> > > > -   struct vring_desc *indir_desc;  /* Indirect descriptor, if any. 
> > > > */
> > > > +
> > > > +   /* Indirect extra table and desc table, if any. These two will 
> > > > be
> > > > +* allocated together. So we won't stress more to the memory 
> > > > allocator.
> > > > +*/
> > > > +   struct vring_desc *indir_desc;
> > >
> > > So it looks like we put a descriptor table after the extra table. Can
> > > this lead to more crossing page mappings for the indirect descriptors?
> > >
> > > If yes, it seems expensive so we probably need to make the descriptor
> > > table come first.
> >
> > No, the descriptors are before extra table.
>
> Well, you need then tweak the above comment, it said
>
> "Indirect extra table and desc table".
>
> > So, there is not performance impact.
> >
> >
> > >
> > > >  };
> > > >
>
> [...]
>
> > > > while (vq->split.vring.desc[i].flags & nextflag) {
> > > > -   vring_unmap_one_split(vq, i);
> > > > +   vring_unmap_one_split(vq, &extra[i]);
> > >
> > > Not sure if I've asked this before. But this part seems to deserve an
> > > independent fix for -stable.
> >
> > What fix?
>
> I meant for hardening we need to check the flags stored in the extra
> instead of the descriptor itself as it could be mangled by the device.

I see, we can do it in future.

Thanks.


>
> Thanks
>



Re: [PATCH net-next v1 1/4] virtio-net: fix overflow inside virtnet_rq_alloc

2024-11-05 Thread Michael S. Tsirkin
On Tue, Oct 29, 2024 at 04:46:12PM +0800, Xuan Zhuo wrote:
> When the frag just got a page, then may lead to regression on VM.
> Specially if the sysctl net.core.high_order_alloc_disable value is 1,
> then the frag always get a page when do refill.
> 
> Which could see reliable crashes or scp failure (scp a file 100M in size
> to VM).
> 
> The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
> of a new frag. When the frag size is larger than PAGE_SIZE,
> everything is fine. However, if the frag is only one page and the
> total size of the buffer and virtnet_rq_dma is larger than one page, an
> overflow may occur.
> 
> The commit f9dac92ba908 ("virtio_ring: enable premapped mode whatever
> use_dma_api") introduced this problem. And we reverted some commits to
> fix this in last linux version. Now we try to enable it and fix this
> bug directly.
> 
> Here, when the frag size is not enough, we reduce the buffer len to fix
> this problem.
> 
> Reported-by: "Si-Wei Liu" 
> Tested-by: Darren Kenny 
> Signed-off-by: Xuan Zhuo 


This one belongs in net I feel. However, patches 2 and on
depend on it not to cause regressions. So I suggest merging it
on both branches, git will figure out the conflict I think.


> ---
>  drivers/net/virtio_net.c | 12 +---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 792e9eadbfc3..d50c1940eb23 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -926,9 +926,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, 
> u32 size, gfp_t gfp)
>   void *buf, *head;
>   dma_addr_t addr;
>  
> - if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
> - return NULL;
> -
>   head = page_address(alloc_frag->page);
>  
>   if (rq->do_dma) {
> @@ -2423,6 +2420,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, 
> struct receive_queue *rq,
>   len = SKB_DATA_ALIGN(len) +
> SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>  
> + if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
> + return -ENOMEM;
> +
>   buf = virtnet_rq_alloc(rq, len, gfp);
>   if (unlikely(!buf))
>   return -ENOMEM;
> @@ -2525,6 +2525,12 @@ static int add_recvbuf_mergeable(struct virtnet_info 
> *vi,
>*/
>   len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
>  
> + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
> + return -ENOMEM;
> +
> + if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > 
> alloc_frag->size)
> + len -= sizeof(struct virtnet_rq_dma);
> +
>   buf = virtnet_rq_alloc(rq, len + room, gfp);
>   if (unlikely(!buf))
>   return -ENOMEM;
> -- 
> 2.32.0.3.g01195cf9f




Re: [PATCH net-next v1 0/4] virtio_net: enable premapped mode by default

2024-11-05 Thread Michael S. Tsirkin
On Mon, Nov 04, 2024 at 06:46:41PM -0800, Jakub Kicinski wrote:
> On Tue, 29 Oct 2024 16:46:11 +0800 Xuan Zhuo wrote:
> > In the last linux version, we disabled this feature to fix the
> > regress[1].
> > 
> > The patch set is try to fix the problem and re-enable it.
> > 
> > More info: 
> > http://lore.kernel.org/all/20240820071913.68004-1-xuanz...@linux.alibaba.com
> 
> Sorry to ping, Michael, Jason we're waiting to hear from you on 
> this one.

Can patch 1 be applied on net as well? Or I can take it through
my tree. It's a bugfix, just for an uncommon configuration.