date:20230815

Re: [PATCH vhost v13 05/12] virtio_ring: introduce virtqueue_dma_dev()

2023-08-15 Thread Jason Wang

On Tue, Aug 15, 2023 at 2:32 PM Xuan Zhuo  wrote:
>
>
> Hi, Jason
>
> Could you skip this patch?

I'm fine with either merging or dropping this.

>
> Let we review other patches firstly?

I will be on vacation soon, and won't have time to do this until next week.

But I spot two possible "issues":

1) the DMA metadata were stored in the headroom of the page, this
breaks frags coalescing, we need to benchmark it's impact
2) pre mapped DMA addresses were not reused in the case of XDP_TX/XDP_REDIRECT

I see Michael has merge this series so I'm fine to let it go first.

Thanks

>
> Thanks.
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC 2/4] vdpa/mlx5: implement .reset_map driver op

2023-08-15 Thread Dragos Tatulea via Virtualization

On Mon, 2023-08-14 at 18:43 -0700, Si-Wei Liu wrote:
> This patch is based on top of the "vdpa/mlx5: Fixes
> for ASID handling" series [1].
> 
> [1] vdpa/mlx5: Fixes for ASID handling
> https://lore.kernel.org/virtualization/20230802171231.11001-1-dtatu...@nvidia.com/
> 
> Signed-off-by: Si-Wei Liu 
> ---
>  drivers/vdpa/mlx5/core/mlx5_vdpa.h |  1 +
>  drivers/vdpa/mlx5/core/mr.c    | 72 +
> -
>  drivers/vdpa/mlx5/net/mlx5_vnet.c  | 18 +++---
>  3 files changed, 54 insertions(+), 37 deletions(-)
> 
> diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> index b53420e..5c9a25a 100644
> --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
> @@ -123,6 +123,7 @@ int mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev,
> struct vhost_iotlb *iotlb,
> unsigned int asid);
>  void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev);
>  void mlx5_vdpa_destroy_mr_asid(struct mlx5_vdpa_dev *mvdev, unsigned int
> asid);
> +int mlx5_vdpa_reset_mr(struct mlx5_vdpa_dev *mvdev, unsigned int asid);
>  
>  #define mlx5_vdpa_warn(__dev, format,
> ...) \
> dev_warn((__dev)->mdev->device, "%s:%d:(pid %d) warning: " format,
> __func__, __LINE__, \
> diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
> index 5a1971fc..c8d64fc 100644
> --- a/drivers/vdpa/mlx5/core/mr.c
> +++ b/drivers/vdpa/mlx5/core/mr.c
> @@ -489,21 +489,15 @@ static void destroy_user_mr(struct mlx5_vdpa_dev *mvdev,
> struct mlx5_vdpa_mr *mr
> }
>  }
>  
> -static void _mlx5_vdpa_destroy_cvq_mr(struct mlx5_vdpa_dev *mvdev, unsigned
> int asid)
> +static void _mlx5_vdpa_destroy_cvq_mr(struct mlx5_vdpa_dev *mvdev)
>  {
> -   if (mvdev->group2asid[MLX5_VDPA_CVQ_GROUP] != asid)
> -   return;
> -
> prune_iotlb(mvdev);
>  }
>  
> -static void _mlx5_vdpa_destroy_dvq_mr(struct mlx5_vdpa_dev *mvdev, unsigned
> int asid)
> +static void _mlx5_vdpa_destroy_dvq_mr(struct mlx5_vdpa_dev *mvdev)
>  {
> struct mlx5_vdpa_mr *mr = &mvdev->mr;
>  
> -   if (mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP] != asid)
> -   return;
> -
> if (!mr->initialized)
> return;
>  
> @@ -521,8 +515,10 @@ void mlx5_vdpa_destroy_mr_asid(struct mlx5_vdpa_dev
> *mvdev, unsigned int asid)
>  
> mutex_lock(&mr->mkey_mtx);
>  
> -   _mlx5_vdpa_destroy_dvq_mr(mvdev, asid);
> -   _mlx5_vdpa_destroy_cvq_mr(mvdev, asid);
> +   if (mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP] == asid)
> +   _mlx5_vdpa_destroy_dvq_mr(mvdev);
> +   if (mvdev->group2asid[MLX5_VDPA_CVQ_GROUP] == asid)
> +   _mlx5_vdpa_destroy_cvq_mr(mvdev);
>  
> mutex_unlock(&mr->mkey_mtx);
>  }
> @@ -534,25 +530,17 @@ void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev)
>  }
>  
>  static int _mlx5_vdpa_create_cvq_mr(struct mlx5_vdpa_dev *mvdev,
> -   struct vhost_iotlb *iotlb,
> -   unsigned int asid)
> +   struct vhost_iotlb *iotlb)
>  {
> -   if (mvdev->group2asid[MLX5_VDPA_CVQ_GROUP] != asid)
> -   return 0;
> -
> return dup_iotlb(mvdev, iotlb);
>  }
>  
>  static int _mlx5_vdpa_create_dvq_mr(struct mlx5_vdpa_dev *mvdev,
> -   struct vhost_iotlb *iotlb,
> -   unsigned int asid)
> +   struct vhost_iotlb *iotlb)
>  {
> struct mlx5_vdpa_mr *mr = &mvdev->mr;
> int err;
>  
> -   if (mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP] != asid)
> -   return 0;
> -
> if (mr->initialized)
> return 0;
>  
> @@ -574,20 +562,18 @@ static int _mlx5_vdpa_create_mr(struct mlx5_vdpa_dev
> *mvdev,
>  {
> int err;
>  
> -   err = _mlx5_vdpa_create_dvq_mr(mvdev, iotlb, asid);
> -   if (err)
> -   return err;
> -
> -   err = _mlx5_vdpa_create_cvq_mr(mvdev, iotlb, asid);
> -   if (err)
> -   goto out_err;
> +   if (mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP] == asid) {
> +   err = _mlx5_vdpa_create_dvq_mr(mvdev, iotlb, asid);
> +   if (err)
> +   return err;
> +   }
> +   if (mvdev->group2asid[MLX5_VDPA_CVQ_GROUP] == asid) {
> +   err = _mlx5_vdpa_create_cvq_mr(mvdev, iotlb);
> +   if (err)
> +   return err;
I think you still need the goto here, when CVQ and DVQ fall in same asid and
there's a CVQ mr creation error, you are left stuck with the DVQ mr.

> +   }
>  
> return 0;
> -
> -out_err:
> -   _mlx5_vdpa_destroy_dvq_mr(mvdev, asid);
> -
> -   return err;
>  }
>  
>  int mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb
> *iotlb,
> @@ -601,6 +587,28 @@ int mlx5_vdpa_create_mr(stru

[PATCH] MAINTAINERS: Add myself as mlx5_vdpa driver

2023-08-15 Thread Dragos Tatulea via Virtualization

As Eli Cohen moved to other work, I'll be the contact point for
mlx5_vdpa.

Signed-off-by: Dragos Tatulea 
---
 MAINTAINERS | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 9a5863f1b016..c9a9259f4d37 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13555,6 +13555,12 @@ F: Documentation/leds/leds-mlxcpld.rst
 F: drivers/leds/leds-mlxcpld.c
 F: drivers/leds/leds-mlxreg.c
 
+MELLANOX MLX5_VDPA DRIVER
+M: Dragos Tatulea 
+L: virtualization@lists.linux-foundation.org
+S: Supported
+F: drivers/vdpa/mlx5/
+
 MELLANOX PLATFORM DRIVER
 M: Vadim Pasternak 
 L: platform-driver-...@vger.kernel.org
-- 
2.41.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vhost v13 05/12] virtio_ring: introduce virtqueue_dma_dev()

2023-08-15 Thread Xuan Zhuo

On Tue, 15 Aug 2023 15:50:23 +0800, Jason Wang  wrote:
> On Tue, Aug 15, 2023 at 2:32 PM Xuan Zhuo  wrote:
> >
> >
> > Hi, Jason
> >
> > Could you skip this patch?
>
> I'm fine with either merging or dropping this.
>
> >
> > Let we review other patches firstly?
>
> I will be on vacation soon, and won't have time to do this until next week.

Have a happly vacation.

>
> But I spot two possible "issues":
>
> 1) the DMA metadata were stored in the headroom of the page, this
> breaks frags coalescing, we need to benchmark it's impact

Not every page, just the first page of the COMP pages.

So I think there is no impact.


> 2) pre mapped DMA addresses were not reused in the case of XDP_TX/XDP_REDIRECT

Because that the tx is not the premapped mode.

Thanks.

>
> I see Michael has merge this series so I'm fine to let it go first.
>
> Thanks
>
> >
> > Thanks.
> >
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vhost v13 05/12] virtio_ring: introduce virtqueue_dma_dev()

2023-08-15 Thread Michael S. Tsirkin

On Tue, Aug 15, 2023 at 03:50:23PM +0800, Jason Wang wrote:
> On Tue, Aug 15, 2023 at 2:32 PM Xuan Zhuo  wrote:
> >
> >
> > Hi, Jason
> >
> > Could you skip this patch?
> 
> I'm fine with either merging or dropping this.
> 
> >
> > Let we review other patches firstly?
> 
> I will be on vacation soon, and won't have time to do this until next week.
> 
> But I spot two possible "issues":
> 
> 1) the DMA metadata were stored in the headroom of the page, this
> breaks frags coalescing, we need to benchmark it's impact
> 2) pre mapped DMA addresses were not reused in the case of XDP_TX/XDP_REDIRECT
> 
> I see Michael has merge this series so I'm fine to let it go first.
> 
> Thanks

it's still queued for next. not too late to drop or better add
a patch on top.


> >
> > Thanks.
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net v1] virtio_net: Introduce skb_vnet_common_hdr to avoid typecasting

2023-08-15 Thread Feng Liu via Virtualization





On 2023-08-15 a.m.6:51, Simon Horman wrote:

External email: Use caution opening links or attachments


On Mon, Aug 14, 2023 at 01:18:45PM -0400, Feng Liu wrote:

+ "David S. Miller" 
   Eric Dumazet 
   Jakub Kicinski 
   Paolo Abeni 


Thanks for adding David S. Miller.


The virtio_net driver currently deals with different versions and types
of virtio net headers, such as virtio_net_hdr_mrg_rxbuf,
virtio_net_hdr_v1_hash, etc. Due to these variations, the code relies
on multiple type casts to convert memory between different structures,
potentially leading to bugs when there are changes in these structures.

Introduces the "struct skb_vnet_common_hdr" as a unifying header
structure using a union. With this approach, various virtio net header
structures can be converted by accessing different members of this
structure, thus eliminating the need for type casting and reducing the
risk of potential bugs.

For example following code:
static struct sk_buff *page_to_skb(struct virtnet_info *vi,
   struct receive_queue *rq,
   struct page *page, unsigned int offset,
   unsigned int len, unsigned int truesize,
   unsigned int headroom)
{
[...]
   struct virtio_net_hdr_mrg_rxbuf *hdr;
[...]
   hdr_len = vi->hdr_len;
[...]
ok:
   hdr = skb_vnet_hdr(skb);
   memcpy(hdr, hdr_p, hdr_len);
[...]
}

When VIRTIO_NET_F_HASH_REPORT feature is enabled, hdr_len = 20
But the sizeof(*hdr) is 12,
memcpy(hdr, hdr_p, hdr_len); will copy 20 bytes to the hdr,
which make a potential risk of bug. And this risk can be avoided by
introducing struct virtio_net_hdr_mrg_rxbuf.

Signed-off-by: Feng Liu 
Reviewed-by: Jiri Pirko 


I'm unsure if this is 'net' material.



It is about the modification of the virtio_net driver. I think it should 
be regarded as `net` material.



---
  drivers/net/virtio_net.c| 29 -
  include/uapi/linux/virtio_net.h |  7 +++
  2 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 1270c8d23463..6ce0fbcabda9 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -344,9 +344,10 @@ static int rxq2vq(int rxq)
   return rxq * 2;
  }

-static inline struct virtio_net_hdr_mrg_rxbuf *skb_vnet_hdr(struct sk_buff 
*skb)
+static inline struct virtio_net_common_hdr *
+skb_vnet_common_hdr(struct sk_buff *skb)
  {
- return (struct virtio_net_hdr_mrg_rxbuf *)skb->cb;
+ return (struct virtio_net_common_hdr *)skb->cb;
  }

  /*
@@ -469,7 +470,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
  unsigned int headroom)
  {
   struct sk_buff *skb;
- struct virtio_net_hdr_mrg_rxbuf *hdr;
+ struct virtio_net_common_hdr *hdr;
   unsigned int copy, hdr_len, hdr_padded_len;
   struct page *page_to_free = NULL;
   int tailroom, shinfo_size;
@@ -554,7 +555,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
   give_pages(rq, page);

  ok:
- hdr = skb_vnet_hdr(skb);
+ hdr = skb_vnet_common_hdr(skb);
   memcpy(hdr, hdr_p, hdr_len);
   if (page_to_free)
   put_page(page_to_free);
@@ -966,7 +967,7 @@ static struct sk_buff *receive_small_build_skb(struct 
virtnet_info *vi,
   return NULL;

   buf += header_offset;
- memcpy(skb_vnet_hdr(skb), buf, vi->hdr_len);
+ memcpy(skb_vnet_common_hdr(skb), buf, vi->hdr_len);

   return skb;
  }
@@ -1577,7 +1578,8 @@ static void receive_buf(struct virtnet_info *vi, struct 
receive_queue *rq,
  {
   struct net_device *dev = vi->dev;
   struct sk_buff *skb;
- struct virtio_net_hdr_mrg_rxbuf *hdr;
+ struct virtio_net_common_hdr *common_hdr;
+ struct virtio_net_hdr_mrg_rxbuf *mrg_hdr;

   if (unlikely(len < vi->hdr_len + ETH_HLEN)) {
   pr_debug("%s: short packet %i\n", dev->name, len);
@@ -1597,18 +1599,19 @@ static void receive_buf(struct virtnet_info *vi, struct 
receive_queue *rq,
   if (unlikely(!skb))
   return;

- hdr = skb_vnet_hdr(skb);
+ common_hdr = skb_vnet_common_hdr(skb);
   if (dev->features & NETIF_F_RXHASH && vi->has_rss_hash_report)
- virtio_skb_set_hash((const struct virtio_net_hdr_v1_hash *)hdr, 
skb);
+ virtio_skb_set_hash(&common_hdr->hash_v1_hdr, skb);

- if (hdr->hdr.flags & VIRTIO_NET_HDR_F_DATA_VALID)
+ mrg_hdr = &common_hdr->mrg_hdr;
+ if (mrg_hdr->hdr.flags & VIRTIO_NET_HDR_F_DATA_VALID)
   skb->ip_summed = CHECKSUM_UNNECESSARY;

- if (virtio_net_hdr_to_skb(skb, &hdr->hdr,
+ if (virtio_net_hdr_to_skb(skb, &mrg_hdr->hdr,
 virtio_is_little_endian(vi->vdev))) {
   net_warn_ratelimited("%s: bad gso: type: %u, size: %u\n",
-  dev->name, hdr->hdr.gso_type,
-  hdr->hdr.gso_size);
+

Re: [PATCH net v1] virtio_net: Introduce skb_vnet_common_hdr to avoid typecasting

2023-08-15 Thread Willem de Bruijn

On Tue, Aug 15, 2023 at 12:29 PM Simon Horman  wrote:
>
> On Tue, Aug 15, 2023 at 11:09:02AM -0400, Feng Liu wrote:
> >
> >
> > On 2023-08-15 a.m.6:51, Simon Horman wrote:
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > On Mon, Aug 14, 2023 at 01:18:45PM -0400, Feng Liu wrote:
> > >
> > > + "David S. Miller" 
> > >Eric Dumazet 
> > >Jakub Kicinski 
> > >Paolo Abeni 
> > >
> > Thanks for adding David S. Miller.
> >
> > > > The virtio_net driver currently deals with different versions and types
> > > > of virtio net headers, such as virtio_net_hdr_mrg_rxbuf,
> > > > virtio_net_hdr_v1_hash, etc. Due to these variations, the code relies
> > > > on multiple type casts to convert memory between different structures,
> > > > potentially leading to bugs when there are changes in these structures.
> > > >
> > > > Introduces the "struct skb_vnet_common_hdr" as a unifying header
> > > > structure using a union. With this approach, various virtio net header
> > > > structures can be converted by accessing different members of this
> > > > structure, thus eliminating the need for type casting and reducing the
> > > > risk of potential bugs.
> > > >
> > > > For example following code:
> > > > static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > >struct receive_queue *rq,
> > > >struct page *page, unsigned int offset,
> > > >unsigned int len, unsigned int truesize,
> > > >unsigned int headroom)
> > > > {
> > > > [...]
> > > >struct virtio_net_hdr_mrg_rxbuf *hdr;
> > > > [...]
> > > >hdr_len = vi->hdr_len;
> > > > [...]
> > > > ok:
> > > >hdr = skb_vnet_hdr(skb);
> > > >memcpy(hdr, hdr_p, hdr_len);
> > > > [...]
> > > > }
> > > >
> > > > When VIRTIO_NET_F_HASH_REPORT feature is enabled, hdr_len = 20
> > > > But the sizeof(*hdr) is 12,
> > > > memcpy(hdr, hdr_p, hdr_len); will copy 20 bytes to the hdr,
> > > > which make a potential risk of bug. And this risk can be avoided by
> > > > introducing struct virtio_net_hdr_mrg_rxbuf.
> > > >
> > > > Signed-off-by: Feng Liu 
> > > > Reviewed-by: Jiri Pirko 
> > >
> > > I'm unsure if this is 'net' material.
> > >
> >
> > It is about the modification of the virtio_net driver. I think it should be
> > regarded as `net` material.
>
> To clarify: In general new Networking features go via the net-next tree,
> while bug fixes go via the net tree. I was suggesting this
> is more appropriate for net-next, and that should be reflected in the
> subject.
>
> Subject: [PATCH net-next] ...
>
> Sorry for not being clearer the first time around.

Right, this should go to net-next.

>
> >
> > > > ---
> > > >   drivers/net/virtio_net.c| 29 -
> > > >   include/uapi/linux/virtio_net.h |  7 +++
> > > >   2 files changed, 23 insertions(+), 13 deletions(-)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index 1270c8d23463..6ce0fbcabda9 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -344,9 +344,10 @@ static int rxq2vq(int rxq)
> > > >return rxq * 2;
> > > >   }
> > > >
> > > > -static inline struct virtio_net_hdr_mrg_rxbuf *skb_vnet_hdr(struct 
> > > > sk_buff *skb)
> > > > +static inline struct virtio_net_common_hdr *
> > > > +skb_vnet_common_hdr(struct sk_buff *skb)
> > > >   {
> > > > - return (struct virtio_net_hdr_mrg_rxbuf *)skb->cb;
> > > > + return (struct virtio_net_common_hdr *)skb->cb;
> > > >   }
> > > >
> > > >   /*
> > > > @@ -469,7 +470,7 @@ static struct sk_buff *page_to_skb(struct 
> > > > virtnet_info *vi,
> > > >   unsigned int headroom)
> > > >   {
> > > >struct sk_buff *skb;
> > > > - struct virtio_net_hdr_mrg_rxbuf *hdr;
> > > > + struct virtio_net_common_hdr *hdr;
> > > >unsigned int copy, hdr_len, hdr_padded_len;
> > > >struct page *page_to_free = NULL;
> > > >int tailroom, shinfo_size;
> > > > @@ -554,7 +555,7 @@ static struct sk_buff *page_to_skb(struct 
> > > > virtnet_info *vi,
> > > >give_pages(rq, page);
> > > >
> > > >   ok:
> > > > - hdr = skb_vnet_hdr(skb);
> > > > + hdr = skb_vnet_common_hdr(skb);
> > > >memcpy(hdr, hdr_p, hdr_len);
> > > >if (page_to_free)
> > > >put_page(page_to_free);
> > > > @@ -966,7 +967,7 @@ static struct sk_buff 
> > > > *receive_small_build_skb(struct virtnet_info *vi,
> > > >return NULL;
> > > >
> > > >buf += header_offset;
> > > > - memcpy(skb_vnet_hdr(skb), buf, vi->hdr_len);
> > > > + memcpy(skb_vnet_common_hdr(skb), buf, vi->hdr_len);
> > > >
> > > >return skb;
> > > >   }
> > > > @@ -1577,7 +1578,8 @@ static void receive_buf(struct virtnet_info *vi, 
> > > > struct receive_queue *rq,
> > > >   {
> > > >struct net_device *dev = vi->dev;
> > > >struct sk_bu

Re: [PATCH RFC 1/4] vdpa: introduce .reset_map operation callback

2023-08-15 Thread Si-Wei Liu




On 8/14/2023 7:21 PM, Jason Wang wrote:

On Tue, Aug 15, 2023 at 9:46 AM Si-Wei Liu  wrote:

Signed-off-by: Si-Wei Liu 
---
  include/linux/vdpa.h | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index db1b0ea..3a3878d 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -314,6 +314,12 @@ struct vdpa_map_file {
   * @iova: iova to be unmapped
   * @size: size of the area
   * Returns integer: success (0) or error (< 0)
+ * @reset_map: Reset device memory mapping (optional)
+ * Needed for device that using device
+ * specific DMA translation (on-chip IOMMU)

This exposes the device internal to the upper layer which is not optimal.
Not sure what does it mean by "device internal", but this op callback 
just follows existing convention to describe what vdpa parent this API 
targets.


 * @set_map:    Set device memory mapping (optional)
 *  Needed for device that using device
 *  specific DMA translation (on-chip IOMMU)
:
:
 * @dma_map:    Map an area of PA to IOVA (optional)
 *  Needed for device that using device
 *  specific DMA translation (on-chip IOMMU)
 *  and preferring incremental map.
:
:
 * @dma_unmap:  Unmap an area of IOVA (optional but
 *  must be implemented with dma_map)
 *  Needed for device that using device
 *  specific DMA translation (on-chip IOMMU)
 *  and preferring incremental unmap.



Btw, what's the difference between this and a simple

set_map(NULL)?
I don't think parent drivers support this today - they can accept 
non-NULL iotlb containing empty map entry, but not a NULL iotlb. The 
behavior is undefined or it even causes panic when a NULL iotlb is 
passed in. Further this doesn't work with .dma_map parent drivers.


The reason why a new op is needed or better is because it allows 
userspace to tell apart different reset behavior from the older kernel 
(via the F_IOTLB_PERSIST feature bit in patch 4), while this behavior 
could vary between parent drivers.


Regards,
-Siwei



Thanks


+ * @vdev: vdpa device
+ * @asid: address space identifier
+ * Returns integer: success (0) or error (< 0)
   * @get_vq_dma_dev:Get the dma device for a specific
   * virtqueue (optional)
   * @vdev: vdpa device
@@ -390,6 +396,7 @@ struct vdpa_config_ops {
u64 iova, u64 size, u64 pa, u32 perm, void *opaque);
 int (*dma_unmap)(struct vdpa_device *vdev, unsigned int asid,
  u64 iova, u64 size);
+   int (*reset_map)(struct vdpa_device *vdev, unsigned int asid);
 int (*set_group_asid)(struct vdpa_device *vdev, unsigned int group,
   unsigned int asid);
 struct device *(*get_vq_dma_dev)(struct vdpa_device *vdev, u16 idx);
--
1.8.3.1



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC 4/4] vhost-vdpa: introduce IOTLB_PERSIST backend feature bit

2023-08-15 Thread Si-Wei Liu




On 8/14/2023 7:25 PM, Jason Wang wrote:

On Tue, Aug 15, 2023 at 9:45 AM Si-Wei Liu  wrote:

Signed-off-by: Si-Wei Liu 
---
  drivers/vhost/vdpa.c | 16 +++-
  include/uapi/linux/vhost_types.h |  2 ++
  2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 62b0a01..75092a7 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -406,6 +406,14 @@ static bool vhost_vdpa_can_resume(const struct vhost_vdpa 
*v)
 return ops->resume;
  }

+static bool vhost_vdpa_has_persistent_map(const struct vhost_vdpa *v)
+{
+   struct vdpa_device *vdpa = v->vdpa;
+   const struct vdpa_config_ops *ops = vdpa->config;
+
+   return (!ops->set_map && !ops->dma_map) || ops->reset_map;

So this means the IOTLB/IOMMU mappings have already been decoupled
from the vdpa reset.
Not in the sense of API, it' been coupled since day one from the 
implementations of every on-chip IOMMU parent driver, namely mlx5_vdpa 
and vdpa_sim. Because of that, later on the (improper) support for 
virtio-vdpa, from commit 6f5312f80183 ("vdpa/mlx5: Add support for 
running with virtio_vdpa") and 6c3d329e6486 ("vdpa_sim: get rid of DMA 
ops") misused the .reset() op to realize 1:1 mapping, rendering strong 
coupling between device reset and reset of iotlb mappings. This series 
try to rectify that implementation deficiency, while keep userspace 
continuing to work with older kernel behavior.



  So it should have been noticed by the userspace.
Yes, userspace had noticed this no-chip IOMMU discrepancy since day one 
I suppose. Unfortunately there's already code in userspace with this 
assumption in mind that proactively tears down and sets up iotlb mapping 
around vdpa device reset...

I guess we can just fix the simulator and mlx5 then we are fine?
Only IF we don't care about running new QEMU on older kernels with 
flawed on-chip iommu behavior around reset. But that's a big IF...


Regards,
-Siwei


Thanks



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC 3/4] vhost-vdpa: should restore 1:1 dma mapping before detaching driver

2023-08-15 Thread Si-Wei Liu




On 8/14/2023 7:32 PM, Jason Wang wrote:

On Tue, Aug 15, 2023 at 9:45 AM Si-Wei Liu  wrote:

Signed-off-by: Si-Wei Liu 
---
  drivers/vhost/vdpa.c | 17 +
  1 file changed, 17 insertions(+)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index b43e868..62b0a01 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -131,6 +131,15 @@ static struct vhost_vdpa_as 
*vhost_vdpa_find_alloc_as(struct vhost_vdpa *v,
 return vhost_vdpa_alloc_as(v, asid);
  }

+static void vhost_vdpa_reset_map(struct vhost_vdpa *v, u32 asid)
+{
+   struct vdpa_device *vdpa = v->vdpa;
+   const struct vdpa_config_ops *ops = vdpa->config;
+
+   if (ops->reset_map)
+   ops->reset_map(vdpa, asid);
+}
+
  static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid)
  {
 struct vhost_vdpa_as *as = asid_to_as(v, asid);
@@ -140,6 +149,14 @@ static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 
asid)

 hlist_del(&as->hash_link);
 vhost_vdpa_iotlb_unmap(v, &as->iotlb, 0ULL, 0ULL - 1, asid);
+   /*
+* Devices with on-chip IOMMU need to restore iotlb
+* to 1:1 identity mapping before vhost-vdpa is going
+* to be removed and detached from the device. Give
+* them a chance to do so, as this cannot be done
+* efficiently via the whole-range unmap call above.
+*/

Same question as before, if 1:1 is restored and the userspace doesn't
do any IOTLB updating. It looks like a security issue? (Assuming IOVA
is PA)
This is already flawed independent of this series. It was introduced 
from the two commits I referenced earlier in the other thread. Today 
userspace is already able to do so with device reset and don't do any 
IOTLB update. This series don't get it worse nor make it better.


FWIW as said earlier, to address this security issue properly we 
probably should set up 1:1 DMA mapping in virtio_vdpa_probe() on demand, 
and tears it down at virtio_vdpa_release_dev(). Question is, was 
virtio-vdpa the only vdpa bus user that needs 1:1 DMA mapping, or it's 
the other way around that vhost-vdpa is the only exception among all 
vdpa bus drivers that don't want to start with 1:1 by default. This 
would help parent vdpa implementation for what kind of mapping it should 
start with upon creation.


Regards,
-Siwei





Thanks


+   vhost_vdpa_reset_map(v, asid);
 kfree(as);

 return 0;
--
1.8.3.1



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC 2/4] vdpa/mlx5: implement .reset_map driver op

2023-08-15 Thread Si-Wei Liu




On 8/15/2023 1:26 AM, Dragos Tatulea wrote:

On Mon, 2023-08-14 at 18:43 -0700, Si-Wei Liu wrote:

This patch is based on top of the "vdpa/mlx5: Fixes
for ASID handling" series [1].

[1] vdpa/mlx5: Fixes for ASID handling
https://lore.kernel.org/virtualization/20230802171231.11001-1-dtatu...@nvidia.com/

Signed-off-by: Si-Wei Liu 
---
  drivers/vdpa/mlx5/core/mlx5_vdpa.h |  1 +
  drivers/vdpa/mlx5/core/mr.c    | 72 +
-
  drivers/vdpa/mlx5/net/mlx5_vnet.c  | 18 +++---
  3 files changed, 54 insertions(+), 37 deletions(-)

diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
index b53420e..5c9a25a 100644
--- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
@@ -123,6 +123,7 @@ int mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev,
struct vhost_iotlb *iotlb,
 unsigned int asid);
  void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev);
  void mlx5_vdpa_destroy_mr_asid(struct mlx5_vdpa_dev *mvdev, unsigned int
asid);
+int mlx5_vdpa_reset_mr(struct mlx5_vdpa_dev *mvdev, unsigned int asid);
  
  #define mlx5_vdpa_warn(__dev, format,

...) \
 dev_warn((__dev)->mdev->device, "%s:%d:(pid %d) warning: " format,
__func__, __LINE__, \
diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
index 5a1971fc..c8d64fc 100644
--- a/drivers/vdpa/mlx5/core/mr.c
+++ b/drivers/vdpa/mlx5/core/mr.c
@@ -489,21 +489,15 @@ static void destroy_user_mr(struct mlx5_vdpa_dev *mvdev,
struct mlx5_vdpa_mr *mr
 }
  }
  
-static void _mlx5_vdpa_destroy_cvq_mr(struct mlx5_vdpa_dev *mvdev, unsigned

int asid)
+static void _mlx5_vdpa_destroy_cvq_mr(struct mlx5_vdpa_dev *mvdev)
  {
-   if (mvdev->group2asid[MLX5_VDPA_CVQ_GROUP] != asid)
-   return;
-
 prune_iotlb(mvdev);
  }
  
-static void _mlx5_vdpa_destroy_dvq_mr(struct mlx5_vdpa_dev *mvdev, unsigned

int asid)
+static void _mlx5_vdpa_destroy_dvq_mr(struct mlx5_vdpa_dev *mvdev)
  {
 struct mlx5_vdpa_mr *mr = &mvdev->mr;
  
-   if (mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP] != asid)

-   return;
-
 if (!mr->initialized)
 return;
  
@@ -521,8 +515,10 @@ void mlx5_vdpa_destroy_mr_asid(struct mlx5_vdpa_dev

*mvdev, unsigned int asid)
  
 mutex_lock(&mr->mkey_mtx);
  
-   _mlx5_vdpa_destroy_dvq_mr(mvdev, asid);

-   _mlx5_vdpa_destroy_cvq_mr(mvdev, asid);
+   if (mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP] == asid)
+   _mlx5_vdpa_destroy_dvq_mr(mvdev);
+   if (mvdev->group2asid[MLX5_VDPA_CVQ_GROUP] == asid)
+   _mlx5_vdpa_destroy_cvq_mr(mvdev);
  
 mutex_unlock(&mr->mkey_mtx);

  }
@@ -534,25 +530,17 @@ void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev)
  }
  
  static int _mlx5_vdpa_create_cvq_mr(struct mlx5_vdpa_dev *mvdev,

-   struct vhost_iotlb *iotlb,
-   unsigned int asid)
+   struct vhost_iotlb *iotlb)
  {
-   if (mvdev->group2asid[MLX5_VDPA_CVQ_GROUP] != asid)
-   return 0;
-
 return dup_iotlb(mvdev, iotlb);
  }
  
  static int _mlx5_vdpa_create_dvq_mr(struct mlx5_vdpa_dev *mvdev,

-   struct vhost_iotlb *iotlb,
-   unsigned int asid)
+   struct vhost_iotlb *iotlb)
  {
 struct mlx5_vdpa_mr *mr = &mvdev->mr;
 int err;
  
-   if (mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP] != asid)

-   return 0;
-
 if (mr->initialized)
 return 0;
  
@@ -574,20 +562,18 @@ static int _mlx5_vdpa_create_mr(struct mlx5_vdpa_dev

*mvdev,
  {
 int err;
  
-   err = _mlx5_vdpa_create_dvq_mr(mvdev, iotlb, asid);

-   if (err)
-   return err;
-
-   err = _mlx5_vdpa_create_cvq_mr(mvdev, iotlb, asid);
-   if (err)
-   goto out_err;
+   if (mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP] == asid) {
+   err = _mlx5_vdpa_create_dvq_mr(mvdev, iotlb, asid);
+   if (err)
+   return err;
+   }
+   if (mvdev->group2asid[MLX5_VDPA_CVQ_GROUP] == asid) {
+   err = _mlx5_vdpa_create_cvq_mr(mvdev, iotlb);
+   if (err)
+   return err;

I think you still need the goto here, when CVQ and DVQ fall in same asid and
there's a CVQ mr creation error, you are left stuck with the DVQ mr.

Yes, you are right, I will fix this in v2. Thank you for spotting this!

-Siwei




+   }
  
 return 0;

-
-out_err:
-   _mlx5_vdpa_destroy_dvq_mr(mvdev, asid);
-
-   return err;
  }
  
  int mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb

*iotlb,
@@ -601,6 +587,28 @@ int mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev,
struct vhost_iotlb *iotlb,
 return err;
  }

Re: [PATCH RFC 0/3] vdpa: dedicated descriptor table group

2023-08-15 Thread Si-Wei Liu




On 8/9/2023 7:49 AM, Eugenio Perez Martin wrote:

On Wed, Aug 9, 2023 at 2:56 PM Si-Wei Liu  wrote:

Following patchset introduces dedicated group for descriptor table to
reduce live migration downtime when passthrough VQ is being switched
to shadow VQ. As this RFC set is to seek early feedback on the uAPI
and driver API part, for now there's no associated driver patch consuming
the API. As soon as the support is in place on both hardware device and
driver, performance data will be show using real hardware device. The
target goal of this series is to reduce the SVQ switching overhead
to less than 300ms on a ~100GB guest with 2 non-mq vhost-vdpa devices.


I would expand the cover letter with something in the line of:
The reduction in the downtime is thanks to avoiding the full remap in
the switching.

Sure, will add in the next.




The plan of the intended driver implementation is to use a dedicated
group (specifically, 2 in below table) to host descriptor table for
all data vqs, different from where buffer addresses are contained (in
group 0 as below). cvq does not have to allocate dedicated group for
descriptor table, so its buffers and descriptor table would always
belong to a same group (1).


   |  data vq | ctrl vq
==+==+===
vq_group  |0 |1
vq_desc_group |2 |1



Acked-by: Eugenio Pérez 

Thanks!

-Siwei





---

Si-Wei Liu (3):
   vdpa: introduce dedicated descriptor group for virtqueue
   vhost-vdpa: introduce descriptor group backend feature
   vhost-vdpa: uAPI to get dedicated descriptor group id

  drivers/vhost/vdpa.c | 27 +++
  include/linux/vdpa.h | 11 +++
  include/uapi/linux/vhost.h   |  8 
  include/uapi/linux/vhost_types.h |  5 +
  4 files changed, 51 insertions(+)

--
1.8.3.1



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC 2/3] vhost-vdpa: introduce descriptor group backend feature

2023-08-15 Thread Si-Wei Liu




On 8/9/2023 8:49 PM, Jason Wang wrote:

On Wed, Aug 9, 2023 at 8:56 PM Si-Wei Liu  wrote:

Userspace knows if the device has dedicated descriptor group or not
by checking this feature bit.

It's only exposed if the vdpa driver backend implements the
.get_vq_desc_group() operation callback. Userspace trying to negotiate
this feature when it or the dependent _F_IOTLB_ASID feature hasn't
been exposed will result in an error.

Signed-off-by: Si-Wei Liu 
---
  drivers/vhost/vdpa.c | 17 +
  include/uapi/linux/vhost_types.h |  5 +
  2 files changed, 22 insertions(+)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index b43e868..f2e5dce 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -389,6 +389,14 @@ static bool vhost_vdpa_can_resume(const struct vhost_vdpa 
*v)
 return ops->resume;
  }

+static bool vhost_vdpa_has_desc_group(const struct vhost_vdpa *v)
+{
+   struct vdpa_device *vdpa = v->vdpa;
+   const struct vdpa_config_ops *ops = vdpa->config;
+
+   return ops->get_vq_desc_group;
+}
+
  static long vhost_vdpa_get_features(struct vhost_vdpa *v, u64 __user 
*featurep)
  {
 struct vdpa_device *vdpa = v->vdpa;
@@ -679,6 +687,7 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
 if (copy_from_user(&features, featurep, sizeof(features)))
 return -EFAULT;
 if (features & ~(VHOST_VDPA_BACKEND_FEATURES |
+BIT_ULL(VHOST_BACKEND_F_DESC_ASID) |
  BIT_ULL(VHOST_BACKEND_F_SUSPEND) |
  BIT_ULL(VHOST_BACKEND_F_RESUME)))
 return -EOPNOTSUPP;
@@ -688,6 +697,12 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
 if ((features & BIT_ULL(VHOST_BACKEND_F_RESUME)) &&
  !vhost_vdpa_can_resume(v))
 return -EOPNOTSUPP;
+   if ((features & BIT_ULL(VHOST_BACKEND_F_DESC_ASID)) &&
+   !(features & BIT_ULL(VHOST_BACKEND_F_IOTLB_ASID)))
+   return -EINVAL;
+   if ((features & BIT_ULL(VHOST_BACKEND_F_DESC_ASID)) &&
+!vhost_vdpa_has_desc_group(v))
+   return -EOPNOTSUPP;
 vhost_set_backend_features(&v->vdev, features);
 return 0;
 }
@@ -741,6 +756,8 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
 features |= BIT_ULL(VHOST_BACKEND_F_SUSPEND);
 if (vhost_vdpa_can_resume(v))
 features |= BIT_ULL(VHOST_BACKEND_F_RESUME);
+   if (vhost_vdpa_has_desc_group(v))
+   features |= BIT_ULL(VHOST_BACKEND_F_DESC_ASID);
 if (copy_to_user(featurep, &features, sizeof(features)))
 r = -EFAULT;
 break;
diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
index d3aad12a..0856f84 100644
--- a/include/uapi/linux/vhost_types.h
+++ b/include/uapi/linux/vhost_types.h
@@ -181,5 +181,10 @@ struct vhost_vdpa_iova_range {
  #define VHOST_BACKEND_F_SUSPEND  0x4
  /* Device can be resumed */
  #define VHOST_BACKEND_F_RESUME  0x5
+/* Device may expose the descriptor table, avail and used ring in a
+ * different group for ASID binding than the buffers it contains.

Nit:

s/a different group/different groups/?

Yep, I will try to rephrase. Would below work?

"Device may expose virtqueue's descriptor table, avail and used ring in a
different group for ASID binding than where buffers it contains reside."


Btw, not a native speaker but I think "descriptor" might be confusing
since as you explained above, it contains more than just a descriptor
table.
Yep. I chose "descriptor" because packed virtqueue doesn't have 
"physical" avail and used ring other than descriptor table, but I think 
I am open to a better name. I once thought of "descriptor ring" but that 
might be too specific to packed virtqueue. Any suggestion?


Thanks,
-Siwei



Thanks


+ * Requires VHOST_BACKEND_F_IOTLB_ASID.
+ */
+#define VHOST_BACKEND_F_DESC_ASID0x6

  #endif
--
1.8.3.1



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC 0/3] vdpa: dedicated descriptor table group

2023-08-15 Thread Si-Wei Liu




On 8/9/2023 8:50 PM, Jason Wang wrote:

On Wed, Aug 9, 2023 at 8:56 PM Si-Wei Liu  wrote:

Following patchset introduces dedicated group for descriptor table to
reduce live migration downtime when passthrough VQ is being switched
to shadow VQ. As this RFC set is to seek early feedback on the uAPI
and driver API part, for now there's no associated driver patch consuming
the API. As soon as the support is in place on both hardware device and
driver, performance data will be show using real hardware device. The
target goal of this series is to reduce the SVQ switching overhead
to less than 300ms on a ~100GB guest with 2 non-mq vhost-vdpa devices.

The plan of the intended driver implementation is to use a dedicated
group (specifically, 2 in below table) to host descriptor table for
all data vqs, different from where buffer addresses are contained (in
group 0 as below). cvq does not have to allocate dedicated group for
descriptor table, so its buffers and descriptor table would always
belong to a same group (1).

I'm fine with this, but I think we need an implementation in the
driver (e.g the simulator).
Yes. FWIW for the sake of time saving and get this series accepted 
promptly in the upcoming v6.6 merge window, the driver we're going to 
support along with this series will be mlx5_vdpa in the formal 
submission, and simulator support may come up later after if I got spare 
cycle. Do you foresee any issue without simulator change? We will have 
mlx5_vdpa driver consuming the API for sure, that's the target of this 
work and it has to be proved working on real device at first.


Thanks,
-Siwei



Thanks



   |  data vq | ctrl vq
==+==+===
vq_group  |0 |1
vq_desc_group |2 |1


---

Si-Wei Liu (3):
   vdpa: introduce dedicated descriptor group for virtqueue
   vhost-vdpa: introduce descriptor group backend feature
   vhost-vdpa: uAPI to get dedicated descriptor group id

  drivers/vhost/vdpa.c | 27 +++
  include/linux/vdpa.h | 11 +++
  include/uapi/linux/vhost.h   |  8 
  include/uapi/linux/vhost_types.h |  5 +
  4 files changed, 51 insertions(+)

--
1.8.3.1



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vhost v13 05/12] virtio_ring: introduce virtqueue_dma_dev()

2023-08-15 Thread Jason Wang

On Tue, Aug 15, 2023 at 5:40 PM Xuan Zhuo  wrote:
>
> On Tue, 15 Aug 2023 15:50:23 +0800, Jason Wang  wrote:
> > On Tue, Aug 15, 2023 at 2:32 PM Xuan Zhuo  
> > wrote:
> > >
> > >
> > > Hi, Jason
> > >
> > > Could you skip this patch?
> >
> > I'm fine with either merging or dropping this.
> >
> > >
> > > Let we review other patches firstly?
> >
> > I will be on vacation soon, and won't have time to do this until next week.
>
> Have a happly vacation.
>
> >
> > But I spot two possible "issues":
> >
> > 1) the DMA metadata were stored in the headroom of the page, this
> > breaks frags coalescing, we need to benchmark it's impact
>
> Not every page, just the first page of the COMP pages.
>
> So I think there is no impact.

Nope, see this:

if (SKB_FRAG_PAGE_ORDER &&
!static_branch_unlikely(&net_high_order_alloc_disable_key)) {
/* Avoid direct reclaim but allow kswapd to wake */
pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) |
  __GFP_COMP | __GFP_NOWARN |
  __GFP_NORETRY,
  SKB_FRAG_PAGE_ORDER);
if (likely(pfrag->page)) {
pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER;
return true;
}
}

The comp page might be disabled due to the SKB_FRAG_PAGE_ORDER and
net_high_order_alloc_disable_key.

>
>
> > 2) pre mapped DMA addresses were not reused in the case of 
> > XDP_TX/XDP_REDIRECT
>
> Because that the tx is not the premapped mode.

Yes, we can optimize this on top.

Thanks

>
> Thanks.
>
> >
> > I see Michael has merge this series so I'm fine to let it go first.
> >
> > Thanks
> >
> > >
> > > Thanks.
> > >
> >
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC 4/4] vhost-vdpa: introduce IOTLB_PERSIST backend feature bit

2023-08-15 Thread Jason Wang

On Wed, Aug 16, 2023 at 6:31 AM Si-Wei Liu  wrote:
>
>
>
> On 8/14/2023 7:25 PM, Jason Wang wrote:
> > On Tue, Aug 15, 2023 at 9:45 AM Si-Wei Liu  wrote:
> >> Signed-off-by: Si-Wei Liu 
> >> ---
> >>   drivers/vhost/vdpa.c | 16 +++-
> >>   include/uapi/linux/vhost_types.h |  2 ++
> >>   2 files changed, 17 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> >> index 62b0a01..75092a7 100644
> >> --- a/drivers/vhost/vdpa.c
> >> +++ b/drivers/vhost/vdpa.c
> >> @@ -406,6 +406,14 @@ static bool vhost_vdpa_can_resume(const struct 
> >> vhost_vdpa *v)
> >>  return ops->resume;
> >>   }
> >>
> >> +static bool vhost_vdpa_has_persistent_map(const struct vhost_vdpa *v)
> >> +{
> >> +   struct vdpa_device *vdpa = v->vdpa;
> >> +   const struct vdpa_config_ops *ops = vdpa->config;
> >> +
> >> +   return (!ops->set_map && !ops->dma_map) || ops->reset_map;
> > So this means the IOTLB/IOMMU mappings have already been decoupled
> > from the vdpa reset.
> Not in the sense of API, it' been coupled since day one from the
> implementations of every on-chip IOMMU parent driver, namely mlx5_vdpa
> and vdpa_sim. Because of that, later on the (improper) support for
> virtio-vdpa, from commit 6f5312f80183 ("vdpa/mlx5: Add support for
> running with virtio_vdpa") and 6c3d329e6486 ("vdpa_sim: get rid of DMA
> ops") misused the .reset() op to realize 1:1 mapping, rendering strong
> coupling between device reset and reset of iotlb mappings. This series
> try to rectify that implementation deficiency, while keep userspace
> continuing to work with older kernel behavior.
>
> >   So it should have been noticed by the userspace.
> Yes, userspace had noticed this no-chip IOMMU discrepancy since day one
> I suppose. Unfortunately there's already code in userspace with this
> assumption in mind that proactively tears down and sets up iotlb mapping
> around vdpa device reset...
> > I guess we can just fix the simulator and mlx5 then we are fine?
> Only IF we don't care about running new QEMU on older kernels with
> flawed on-chip iommu behavior around reset. But that's a big IF...

So what I meant is:

Userspace doesn't know whether the vendor specific mappings (set_map)
are required or not. And in the implementation of vhost_vdpa, if
platform IOMMU is used, the mappings are decoupled from the reset. So
if the Qemu works with parents with platform IOMMU it means Qemu can
work if we just decouple vendor specific mappings from the parents
that uses set_map.

Thanks

>
> Regards,
> -Siwei
> >
> > Thanks
> >
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC 0/3] vdpa: dedicated descriptor table group

2023-08-15 Thread Jason Wang

On Wed, Aug 16, 2023 at 8:02 AM Si-Wei Liu  wrote:
>
>
>
> On 8/9/2023 8:50 PM, Jason Wang wrote:
> > On Wed, Aug 9, 2023 at 8:56 PM Si-Wei Liu  wrote:
> >> Following patchset introduces dedicated group for descriptor table to
> >> reduce live migration downtime when passthrough VQ is being switched
> >> to shadow VQ. As this RFC set is to seek early feedback on the uAPI
> >> and driver API part, for now there's no associated driver patch consuming
> >> the API. As soon as the support is in place on both hardware device and
> >> driver, performance data will be show using real hardware device. The
> >> target goal of this series is to reduce the SVQ switching overhead
> >> to less than 300ms on a ~100GB guest with 2 non-mq vhost-vdpa devices.
> >>
> >> The plan of the intended driver implementation is to use a dedicated
> >> group (specifically, 2 in below table) to host descriptor table for
> >> all data vqs, different from where buffer addresses are contained (in
> >> group 0 as below). cvq does not have to allocate dedicated group for
> >> descriptor table, so its buffers and descriptor table would always
> >> belong to a same group (1).
> > I'm fine with this, but I think we need an implementation in the
> > driver (e.g the simulator).
> Yes. FWIW for the sake of time saving and get this series accepted
> promptly in the upcoming v6.6 merge window, the driver we're going to
> support along with this series will be mlx5_vdpa in the formal
> submission, and simulator support may come up later after if I got spare
> cycle.

That would be fine.

> Do you foresee any issue without simulator change?

No.

> We will have
> mlx5_vdpa driver consuming the API for sure, that's the target of this
> work and it has to be proved working on real device at first.

Yes, I mention the simulator just because it's simple, it would be
also great if it is proved on real vDPA parent.

Thanks

>
> Thanks,
> -Siwei
>
> >
> > Thanks
> >
> >>
> >>|  data vq | ctrl vq
> >> ==+==+===
> >> vq_group  |0 |1
> >> vq_desc_group |2 |1
> >>
> >>
> >> ---
> >>
> >> Si-Wei Liu (3):
> >>vdpa: introduce dedicated descriptor group for virtqueue
> >>vhost-vdpa: introduce descriptor group backend feature
> >>vhost-vdpa: uAPI to get dedicated descriptor group id
> >>
> >>   drivers/vhost/vdpa.c | 27 +++
> >>   include/linux/vdpa.h | 11 +++
> >>   include/uapi/linux/vhost.h   |  8 
> >>   include/uapi/linux/vhost_types.h |  5 +
> >>   4 files changed, 51 insertions(+)
> >>
> >> --
> >> 1.8.3.1
> >>
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC 1/4] vdpa: introduce .reset_map operation callback

2023-08-15 Thread Jason Wang

On Wed, Aug 16, 2023 at 3:49 AM Si-Wei Liu  wrote:
>
>
>
> On 8/14/2023 7:21 PM, Jason Wang wrote:
> > On Tue, Aug 15, 2023 at 9:46 AM Si-Wei Liu  wrote:
> >> Signed-off-by: Si-Wei Liu 
> >> ---
> >>   include/linux/vdpa.h | 7 +++
> >>   1 file changed, 7 insertions(+)
> >>
> >> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
> >> index db1b0ea..3a3878d 100644
> >> --- a/include/linux/vdpa.h
> >> +++ b/include/linux/vdpa.h
> >> @@ -314,6 +314,12 @@ struct vdpa_map_file {
> >>* @iova: iova to be unmapped
> >>* @size: size of the area
> >>* Returns integer: success (0) or error (< 
> >> 0)
> >> + * @reset_map: Reset device memory mapping (optional)
> >> + * Needed for device that using device
> >> + * specific DMA translation (on-chip IOMMU)
> > This exposes the device internal to the upper layer which is not optimal.
> Not sure what does it mean by "device internal", but this op callback
> just follows existing convention to describe what vdpa parent this API
> targets.

I meant the bus tries to hide the differences among vendors. So it
needs to hide on-chip IOMMU stuff to the upper layer.

We can expose two dimensional IO mappings models but it looks like
over engineering for this issue. More below.

>
>   * @set_map:Set device memory mapping (optional)
>   *  Needed for device that using device
>   *  specific DMA translation (on-chip IOMMU)
> :
> :
>   * @dma_map:Map an area of PA to IOVA (optional)
>   *  Needed for device that using device
>   *  specific DMA translation (on-chip IOMMU)
>   *  and preferring incremental map.
> :
> :
>   * @dma_unmap:  Unmap an area of IOVA (optional but
>   *  must be implemented with dma_map)
>   *  Needed for device that using device
>   *  specific DMA translation (on-chip IOMMU)
>   *  and preferring incremental unmap.
>
>
> > Btw, what's the difference between this and a simple
> >
> > set_map(NULL)?
> I don't think parent drivers support this today - they can accept
> non-NULL iotlb containing empty map entry, but not a NULL iotlb. The
> behavior is undefined or it even causes panic when a NULL iotlb is
> passed in.

We can do this simple change if it can work.

>  Further this doesn't work with .dma_map parent drivers.

Probably, but I'd remove dma_map as it doesn't have any real users
except for the simulator.

>
> The reason why a new op is needed or better is because it allows
> userspace to tell apart different reset behavior from the older kernel
> (via the F_IOTLB_PERSIST feature bit in patch 4), while this behavior
> could vary between parent drivers.

I'm ok with a new feature flag, but we need to first seek a way to
reuse the existing API.

Thanks

>
> Regards,
> -Siwei
>
> >
> > Thanks
> >
> >> + * @vdev: vdpa device
> >> + * @asid: address space identifier
> >> + * Returns integer: success (0) or error (< 0)
> >>* @get_vq_dma_dev:Get the dma device for a specific
> >>* virtqueue (optional)
> >>* @vdev: vdpa device
> >> @@ -390,6 +396,7 @@ struct vdpa_config_ops {
> >> u64 iova, u64 size, u64 pa, u32 perm, void 
> >> *opaque);
> >>  int (*dma_unmap)(struct vdpa_device *vdev, unsigned int asid,
> >>   u64 iova, u64 size);
> >> +   int (*reset_map)(struct vdpa_device *vdev, unsigned int asid);
> >>  int (*set_group_asid)(struct vdpa_device *vdev, unsigned int 
> >> group,
> >>unsigned int asid);
> >>  struct device *(*get_vq_dma_dev)(struct vdpa_device *vdev, u16 
> >> idx);
> >> --
> >> 1.8.3.1
> >>
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC 2/3] vhost-vdpa: introduce descriptor group backend feature

2023-08-15 Thread Jason Wang

On Wed, Aug 16, 2023 at 7:54 AM Si-Wei Liu  wrote:
>
>
>
> On 8/9/2023 8:49 PM, Jason Wang wrote:
> > On Wed, Aug 9, 2023 at 8:56 PM Si-Wei Liu  wrote:
> >> Userspace knows if the device has dedicated descriptor group or not
> >> by checking this feature bit.
> >>
> >> It's only exposed if the vdpa driver backend implements the
> >> .get_vq_desc_group() operation callback. Userspace trying to negotiate
> >> this feature when it or the dependent _F_IOTLB_ASID feature hasn't
> >> been exposed will result in an error.
> >>
> >> Signed-off-by: Si-Wei Liu 
> >> ---
> >>   drivers/vhost/vdpa.c | 17 +
> >>   include/uapi/linux/vhost_types.h |  5 +
> >>   2 files changed, 22 insertions(+)
> >>
> >> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> >> index b43e868..f2e5dce 100644
> >> --- a/drivers/vhost/vdpa.c
> >> +++ b/drivers/vhost/vdpa.c
> >> @@ -389,6 +389,14 @@ static bool vhost_vdpa_can_resume(const struct 
> >> vhost_vdpa *v)
> >>  return ops->resume;
> >>   }
> >>
> >> +static bool vhost_vdpa_has_desc_group(const struct vhost_vdpa *v)
> >> +{
> >> +   struct vdpa_device *vdpa = v->vdpa;
> >> +   const struct vdpa_config_ops *ops = vdpa->config;
> >> +
> >> +   return ops->get_vq_desc_group;
> >> +}
> >> +
> >>   static long vhost_vdpa_get_features(struct vhost_vdpa *v, u64 __user 
> >> *featurep)
> >>   {
> >>  struct vdpa_device *vdpa = v->vdpa;
> >> @@ -679,6 +687,7 @@ static long vhost_vdpa_unlocked_ioctl(struct file 
> >> *filep,
> >>  if (copy_from_user(&features, featurep, sizeof(features)))
> >>  return -EFAULT;
> >>  if (features & ~(VHOST_VDPA_BACKEND_FEATURES |
> >> +BIT_ULL(VHOST_BACKEND_F_DESC_ASID) |
> >>   BIT_ULL(VHOST_BACKEND_F_SUSPEND) |
> >>   BIT_ULL(VHOST_BACKEND_F_RESUME)))
> >>  return -EOPNOTSUPP;
> >> @@ -688,6 +697,12 @@ static long vhost_vdpa_unlocked_ioctl(struct file 
> >> *filep,
> >>  if ((features & BIT_ULL(VHOST_BACKEND_F_RESUME)) &&
> >>   !vhost_vdpa_can_resume(v))
> >>  return -EOPNOTSUPP;
> >> +   if ((features & BIT_ULL(VHOST_BACKEND_F_DESC_ASID)) &&
> >> +   !(features & BIT_ULL(VHOST_BACKEND_F_IOTLB_ASID)))
> >> +   return -EINVAL;
> >> +   if ((features & BIT_ULL(VHOST_BACKEND_F_DESC_ASID)) &&
> >> +!vhost_vdpa_has_desc_group(v))
> >> +   return -EOPNOTSUPP;
> >>  vhost_set_backend_features(&v->vdev, features);
> >>  return 0;
> >>  }
> >> @@ -741,6 +756,8 @@ static long vhost_vdpa_unlocked_ioctl(struct file 
> >> *filep,
> >>  features |= BIT_ULL(VHOST_BACKEND_F_SUSPEND);
> >>  if (vhost_vdpa_can_resume(v))
> >>  features |= BIT_ULL(VHOST_BACKEND_F_RESUME);
> >> +   if (vhost_vdpa_has_desc_group(v))
> >> +   features |= BIT_ULL(VHOST_BACKEND_F_DESC_ASID);
> >>  if (copy_to_user(featurep, &features, sizeof(features)))
> >>  r = -EFAULT;
> >>  break;
> >> diff --git a/include/uapi/linux/vhost_types.h 
> >> b/include/uapi/linux/vhost_types.h
> >> index d3aad12a..0856f84 100644
> >> --- a/include/uapi/linux/vhost_types.h
> >> +++ b/include/uapi/linux/vhost_types.h
> >> @@ -181,5 +181,10 @@ struct vhost_vdpa_iova_range {
> >>   #define VHOST_BACKEND_F_SUSPEND  0x4
> >>   /* Device can be resumed */
> >>   #define VHOST_BACKEND_F_RESUME  0x5
> >> +/* Device may expose the descriptor table, avail and used ring in a
> >> + * different group for ASID binding than the buffers it contains.
> > Nit:
> >
> > s/a different group/different groups/?
> Yep, I will try to rephrase. Would below work?
>
> "Device may expose virtqueue's descriptor table, avail and used ring in a
> different group for ASID binding than where buffers it contains reside."
>
> > Btw, not a native speaker but I think "descriptor" might be confusing
> > since as you explained above, it contains more than just a descriptor
> > table.
> Yep. I chose "descriptor" because packed virtqueue doesn't have
> "physical" avail and used ring other than descriptor table, but I think
> I am open to a better name. I once thought of "descriptor ring" but that
> might be too specific to packed virtqueue. Any suggestion?

I don't have a better idea, probably "metadata" ?

Thanks

>
> Thanks,
> -Siwei
>
> >
> > Thanks
> >
> >> + * Requires VHOST_BACKEND_F_IOTLB_ASID.
> >> + */
> >> +#define VHOST_BACKEND_F_DESC_ASID0x6
> >>
> >>   #endif
> >> --
> >> 1.8.3.1
> >>
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virt

Re: [PATCH vhost v13 05/12] virtio_ring: introduce virtqueue_dma_dev()

2023-08-15 Thread Xuan Zhuo

On Wed, 16 Aug 2023 09:13:48 +0800, Jason Wang  wrote:
> On Tue, Aug 15, 2023 at 5:40 PM Xuan Zhuo  wrote:
> >
> > On Tue, 15 Aug 2023 15:50:23 +0800, Jason Wang  wrote:
> > > On Tue, Aug 15, 2023 at 2:32 PM Xuan Zhuo  
> > > wrote:
> > > >
> > > >
> > > > Hi, Jason
> > > >
> > > > Could you skip this patch?
> > >
> > > I'm fine with either merging or dropping this.
> > >
> > > >
> > > > Let we review other patches firstly?
> > >
> > > I will be on vacation soon, and won't have time to do this until next 
> > > week.
> >
> > Have a happly vacation.
> >
> > >
> > > But I spot two possible "issues":
> > >
> > > 1) the DMA metadata were stored in the headroom of the page, this
> > > breaks frags coalescing, we need to benchmark it's impact
> >
> > Not every page, just the first page of the COMP pages.
> >
> > So I think there is no impact.
>
> Nope, see this:
>
> if (SKB_FRAG_PAGE_ORDER &&
> !static_branch_unlikely(&net_high_order_alloc_disable_key)) {
> /* Avoid direct reclaim but allow kswapd to wake */
> pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) |
>   __GFP_COMP | __GFP_NOWARN |
>   __GFP_NORETRY,
>   SKB_FRAG_PAGE_ORDER);
> if (likely(pfrag->page)) {
> pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER;
> return true;
> }
> }
>
> The comp page might be disabled due to the SKB_FRAG_PAGE_ORDER and
> net_high_order_alloc_disable_key.


YES.

But if comp page is disabled. Then we only get one page each time. The pages are
not contiguous, so we don't have frags coalescing.

If you mean the two pages got from alloc_page may be contiguous. The coalescing
may then be broken. It's a possibility, but I think the impact will be small.

Thanks.


>
> >
> >
> > > 2) pre mapped DMA addresses were not reused in the case of 
> > > XDP_TX/XDP_REDIRECT
> >
> > Because that the tx is not the premapped mode.
>
> Yes, we can optimize this on top.
>
> Thanks
>
> >
> > Thanks.
> >
> > >
> > > I see Michael has merge this series so I'm fine to let it go first.
> > >
> > > Thanks
> > >
> > > >
> > > > Thanks.
> > > >
> > >
> >
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vhost v13 05/12] virtio_ring: introduce virtqueue_dma_dev()

2023-08-15 Thread Jason Wang

On Wed, Aug 16, 2023 at 10:16 AM Xuan Zhuo  wrote:
>
> On Wed, 16 Aug 2023 09:13:48 +0800, Jason Wang  wrote:
> > On Tue, Aug 15, 2023 at 5:40 PM Xuan Zhuo  
> > wrote:
> > >
> > > On Tue, 15 Aug 2023 15:50:23 +0800, Jason Wang  
> > > wrote:
> > > > On Tue, Aug 15, 2023 at 2:32 PM Xuan Zhuo  
> > > > wrote:
> > > > >
> > > > >
> > > > > Hi, Jason
> > > > >
> > > > > Could you skip this patch?
> > > >
> > > > I'm fine with either merging or dropping this.
> > > >
> > > > >
> > > > > Let we review other patches firstly?
> > > >
> > > > I will be on vacation soon, and won't have time to do this until next 
> > > > week.
> > >
> > > Have a happly vacation.
> > >
> > > >
> > > > But I spot two possible "issues":
> > > >
> > > > 1) the DMA metadata were stored in the headroom of the page, this
> > > > breaks frags coalescing, we need to benchmark it's impact
> > >
> > > Not every page, just the first page of the COMP pages.
> > >
> > > So I think there is no impact.
> >
> > Nope, see this:
> >
> > if (SKB_FRAG_PAGE_ORDER &&
> > !static_branch_unlikely(&net_high_order_alloc_disable_key)) {
> > /* Avoid direct reclaim but allow kswapd to wake */
> > pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) |
> >   __GFP_COMP | __GFP_NOWARN |
> >   __GFP_NORETRY,
> >   SKB_FRAG_PAGE_ORDER);
> > if (likely(pfrag->page)) {
> > pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER;
> > return true;
> > }
> > }
> >
> > The comp page might be disabled due to the SKB_FRAG_PAGE_ORDER and
> > net_high_order_alloc_disable_key.
>
>
> YES.
>
> But if comp page is disabled. Then we only get one page each time. The pages 
> are
> not contiguous, so we don't have frags coalescing.
>
> If you mean the two pages got from alloc_page may be contiguous. The 
> coalescing
> may then be broken. It's a possibility, but I think the impact will be small.

Let's have a simple benchmark and see?

Thanks

>
> Thanks.
>
>
> >
> > >
> > >
> > > > 2) pre mapped DMA addresses were not reused in the case of 
> > > > XDP_TX/XDP_REDIRECT
> > >
> > > Because that the tx is not the premapped mode.
> >
> > Yes, we can optimize this on top.
> >
> > Thanks
> >
> > >
> > > Thanks.
> > >
> > > >
> > > > I see Michael has merge this series so I'm fine to let it go first.
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > > Thanks.
> > > > >
> > > >
> > >
> >
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vhost v13 05/12] virtio_ring: introduce virtqueue_dma_dev()

2023-08-15 Thread Xuan Zhuo

On Wed, 16 Aug 2023 10:19:34 +0800, Jason Wang  wrote:
> On Wed, Aug 16, 2023 at 10:16 AM Xuan Zhuo  wrote:
> >
> > On Wed, 16 Aug 2023 09:13:48 +0800, Jason Wang  wrote:
> > > On Tue, Aug 15, 2023 at 5:40 PM Xuan Zhuo  
> > > wrote:
> > > >
> > > > On Tue, 15 Aug 2023 15:50:23 +0800, Jason Wang  
> > > > wrote:
> > > > > On Tue, Aug 15, 2023 at 2:32 PM Xuan Zhuo 
> > > > >  wrote:
> > > > > >
> > > > > >
> > > > > > Hi, Jason
> > > > > >
> > > > > > Could you skip this patch?
> > > > >
> > > > > I'm fine with either merging or dropping this.
> > > > >
> > > > > >
> > > > > > Let we review other patches firstly?
> > > > >
> > > > > I will be on vacation soon, and won't have time to do this until next 
> > > > > week.
> > > >
> > > > Have a happly vacation.
> > > >
> > > > >
> > > > > But I spot two possible "issues":
> > > > >
> > > > > 1) the DMA metadata were stored in the headroom of the page, this
> > > > > breaks frags coalescing, we need to benchmark it's impact
> > > >
> > > > Not every page, just the first page of the COMP pages.
> > > >
> > > > So I think there is no impact.
> > >
> > > Nope, see this:
> > >
> > > if (SKB_FRAG_PAGE_ORDER &&
> > > !static_branch_unlikely(&net_high_order_alloc_disable_key)) {
> > > /* Avoid direct reclaim but allow kswapd to wake */
> > > pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) |
> > >   __GFP_COMP | __GFP_NOWARN |
> > >   __GFP_NORETRY,
> > >   SKB_FRAG_PAGE_ORDER);
> > > if (likely(pfrag->page)) {
> > > pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER;
> > > return true;
> > > }
> > > }
> > >
> > > The comp page might be disabled due to the SKB_FRAG_PAGE_ORDER and
> > > net_high_order_alloc_disable_key.
> >
> >
> > YES.
> >
> > But if comp page is disabled. Then we only get one page each time. The 
> > pages are
> > not contiguous, so we don't have frags coalescing.
> >
> > If you mean the two pages got from alloc_page may be contiguous. The 
> > coalescing
> > may then be broken. It's a possibility, but I think the impact will be 
> > small.
>
> Let's have a simple benchmark and see?


That is ok.

I think you want to know the perf num with big traffic and the comp page
disabled.

Thanks.


>
> Thanks
>
> >
> > Thanks.
> >
> >
> > >
> > > >
> > > >
> > > > > 2) pre mapped DMA addresses were not reused in the case of 
> > > > > XDP_TX/XDP_REDIRECT
> > > >
> > > > Because that the tx is not the premapped mode.
> > >
> > > Yes, we can optimize this on top.
> > >
> > > Thanks
> > >
> > > >
> > > > Thanks.
> > > >
> > > > >
> > > > > I see Michael has merge this series so I'm fine to let it go first.
> > > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > >
> > > >
> > >
> >
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vhost v13 05/12] virtio_ring: introduce virtqueue_dma_dev()

2023-08-15 Thread Jason Wang

On Wed, Aug 16, 2023 at 10:24 AM Xuan Zhuo  wrote:
>
> On Wed, 16 Aug 2023 10:19:34 +0800, Jason Wang  wrote:
> > On Wed, Aug 16, 2023 at 10:16 AM Xuan Zhuo  
> > wrote:
> > >
> > > On Wed, 16 Aug 2023 09:13:48 +0800, Jason Wang  
> > > wrote:
> > > > On Tue, Aug 15, 2023 at 5:40 PM Xuan Zhuo  
> > > > wrote:
> > > > >
> > > > > On Tue, 15 Aug 2023 15:50:23 +0800, Jason Wang  
> > > > > wrote:
> > > > > > On Tue, Aug 15, 2023 at 2:32 PM Xuan Zhuo 
> > > > > >  wrote:
> > > > > > >
> > > > > > >
> > > > > > > Hi, Jason
> > > > > > >
> > > > > > > Could you skip this patch?
> > > > > >
> > > > > > I'm fine with either merging or dropping this.
> > > > > >
> > > > > > >
> > > > > > > Let we review other patches firstly?
> > > > > >
> > > > > > I will be on vacation soon, and won't have time to do this until 
> > > > > > next week.
> > > > >
> > > > > Have a happly vacation.
> > > > >
> > > > > >
> > > > > > But I spot two possible "issues":
> > > > > >
> > > > > > 1) the DMA metadata were stored in the headroom of the page, this
> > > > > > breaks frags coalescing, we need to benchmark it's impact
> > > > >
> > > > > Not every page, just the first page of the COMP pages.
> > > > >
> > > > > So I think there is no impact.
> > > >
> > > > Nope, see this:
> > > >
> > > > if (SKB_FRAG_PAGE_ORDER &&
> > > > !static_branch_unlikely(&net_high_order_alloc_disable_key)) 
> > > > {
> > > > /* Avoid direct reclaim but allow kswapd to wake */
> > > > pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) 
> > > > |
> > > >   __GFP_COMP | __GFP_NOWARN |
> > > >   __GFP_NORETRY,
> > > >   SKB_FRAG_PAGE_ORDER);
> > > > if (likely(pfrag->page)) {
> > > > pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER;
> > > > return true;
> > > > }
> > > > }
> > > >
> > > > The comp page might be disabled due to the SKB_FRAG_PAGE_ORDER and
> > > > net_high_order_alloc_disable_key.
> > >
> > >
> > > YES.
> > >
> > > But if comp page is disabled. Then we only get one page each time. The 
> > > pages are
> > > not contiguous, so we don't have frags coalescing.
> > >
> > > If you mean the two pages got from alloc_page may be contiguous. The 
> > > coalescing
> > > may then be broken. It's a possibility, but I think the impact will be 
> > > small.
> >
> > Let's have a simple benchmark and see?
>
>
> That is ok.
>
> I think you want to know the perf num with big traffic and the comp page
> disabled.

Yes.

Thanks

>
> Thanks.
>
>
> >
> > Thanks
> >
> > >
> > > Thanks.
> > >
> > >
> > > >
> > > > >
> > > > >
> > > > > > 2) pre mapped DMA addresses were not reused in the case of 
> > > > > > XDP_TX/XDP_REDIRECT
> > > > >
> > > > > Because that the tx is not the premapped mode.
> > > >
> > > > Yes, we can optimize this on top.
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > > Thanks.
> > > > >
> > > > > >
> > > > > > I see Michael has merge this series so I'm fine to let it go first.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net v1] virtio_net: Introduce skb_vnet_common_hdr to avoid typecasting

2023-08-15 Thread Feng Liu via Virtualization




On 2023-08-15 p.m.2:13, Willem de Bruijn wrote:

External email: Use caution opening links or attachments


On Tue, Aug 15, 2023 at 12:29 PM Simon Horman  wrote:


On Tue, Aug 15, 2023 at 11:09:02AM -0400, Feng Liu wrote:



To clarify: In general new Networking features go via the net-next tree,
while bug fixes go via the net tree. I was suggesting this
is more appropriate for net-next, and that should be reflected in the
subject.

 Subject: [PATCH net-next] ...

Sorry for not being clearer the first time around.


Right, this should go to net-next.


Will do, thanks








diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
index 12c1c9699935..db40f93ae8b3 100644
--- a/include/uapi/linux/virtio_net.h
+++ b/include/uapi/linux/virtio_net.h
@@ -201,6 +201,13 @@ struct virtio_net_hdr_mrg_rxbuf {
struct virtio_net_hdr hdr;
__virtio16 num_buffers; /* Number of merged rx buffers */
   };
+
+struct virtio_net_common_hdr {
+ union {
+ struct virtio_net_hdr_mrg_rxbuf mrg_hdr;
+ struct virtio_net_hdr_v1_hash hash_v1_hdr;
+ };
+};


Does this belong in the UAPI?
I would have assumed it's a Kernel implementation detail.


The existing codes, virtio_net.h is in uapi/linux/, I added the new
structure and followed existing code. My modification is related to Kernel
implementation detail now.


The header you have modified forms part of the userspace API (UAPI).
Perhaps there is something about virtio_net that makes this correct, but it
seems to me that kernel-internal details don't belong there.


FWIW, I ran into similar issues before in a draft that added timestamp
support [1]

If we're going to change this structure, we should do it in a way that
is forward proof to future extensions to the virtio spec and with that
the fields in this struct. Especially in UAPI.

Is virtio_net_hdr_v1_hash the latest virtio-spec compliant header? And
do we expect for v1.3 to just add some fields to this?

The struct comment of virtio_net_hdr_v1 states "This is
bitwise-equivalent to the legacy struct virtio_net_hdr_mrg_rxbuf, only
flattened.". I don't quite understand what the flattening bought, vs
having struct virtio_net_hdr as first member. Another difference may
be the endianness between legacy (0.9) and v1.0+.

Since legacy virtio will no longer be modified, I don't think there is
much value is exposing this new union as UAPI. I do appreciate the
benefit to the implementation.

[1] 
https://patches.linaro.org/project/netdev/patch/20210208185558.995292-3-willemdebruijn.ker...@gmail.com/

Hi, William and Simon

Thanks for the detailed explanation.

I kept virtio_net_hdr_mrg_rxbuf and virtio_net_hdr_v1_hash structures in 
virtio_net.h, which can be forward compatible with existing user 
applications which use these structures.


After checking kernel code, the virtio_net_hdr_v1_hash structure does 
only add new members to virtio_net_hdr_mrg_rxbuf, so the spec should 
only add new members, otherwise there will be compatibility problems in 
struct virtio_net_hdr_v1_hash structure.


struct virtio_net_hdr_v1_hash {
struct virtio_net_hdr_v1 hdr; /*same size as virtio_net_hdr*/
[...]
__le32 hash_value; /*new member*/
__le16 hash_report; /*new member*/
__le16 padding; /*new member*/
};

virtio_net_hdr_v1_hash cannot use virtio_net_hdr as the first member, 
because in virtio_net_hdr_v1, csum_start and csum_offset are stored in 
union as a structure, and virtio_net_hdr cannot be used instead.


struct virtio_net_hdr_v1 {
[...]
union {
struct {
__virtio16 csum_start;
__virtio16 csum_offset;
};
[...]
};
__virtio16 num_buffers; /* Number of merged rx buffers */
};


struct virtio_net_hdr {
[...]
__virtio16 csum_start;  
__virtio16 csum_offset; 
};



In addition, I put this new structure virtio_net_common_hdr in uapi, 
hoping it could be used in future user space application to avoid 
potential risks caused by type coercion (such as the problems mentioned 
in the patch description ). So I think it should be in this header file.

What do you think?








___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] MAINTAINERS: Add myself as mlx5_vdpa driver

2023-08-15 Thread Jason Wang

On Tue, Aug 15, 2023 at 4:33 PM Dragos Tatulea  wrote:
>
> As Eli Cohen moved to other work, I'll be the contact point for
> mlx5_vdpa.
>
> Signed-off-by: Dragos Tatulea 

Acked-by: Jason Wang 

Thanks

> ---
>  MAINTAINERS | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 9a5863f1b016..c9a9259f4d37 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -13555,6 +13555,12 @@ F: Documentation/leds/leds-mlxcpld.rst
>  F: drivers/leds/leds-mlxcpld.c
>  F: drivers/leds/leds-mlxreg.c
>
> +MELLANOX MLX5_VDPA DRIVER
> +M: Dragos Tatulea 
> +L: virtualization@lists.linux-foundation.org
> +S: Supported
> +F: drivers/vdpa/mlx5/
> +
>  MELLANOX PLATFORM DRIVER
>  M: Vadim Pasternak 
>  L: platform-driver-...@vger.kernel.org
> --
> 2.41.0
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vhost v13 05/12] virtio_ring: introduce virtqueue_dma_dev()

2023-08-15 Thread Xuan Zhuo

On Wed, 16 Aug 2023 10:33:34 +0800, Jason Wang  wrote:
> On Wed, Aug 16, 2023 at 10:24 AM Xuan Zhuo  wrote:
> >
> > On Wed, 16 Aug 2023 10:19:34 +0800, Jason Wang  wrote:
> > > On Wed, Aug 16, 2023 at 10:16 AM Xuan Zhuo  
> > > wrote:
> > > >
> > > > On Wed, 16 Aug 2023 09:13:48 +0800, Jason Wang  
> > > > wrote:
> > > > > On Tue, Aug 15, 2023 at 5:40 PM Xuan Zhuo 
> > > > >  wrote:
> > > > > >
> > > > > > On Tue, 15 Aug 2023 15:50:23 +0800, Jason Wang 
> > > > > >  wrote:
> > > > > > > On Tue, Aug 15, 2023 at 2:32 PM Xuan Zhuo 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > Hi, Jason
> > > > > > > >
> > > > > > > > Could you skip this patch?
> > > > > > >
> > > > > > > I'm fine with either merging or dropping this.
> > > > > > >
> > > > > > > >
> > > > > > > > Let we review other patches firstly?
> > > > > > >
> > > > > > > I will be on vacation soon, and won't have time to do this until 
> > > > > > > next week.
> > > > > >
> > > > > > Have a happly vacation.
> > > > > >
> > > > > > >
> > > > > > > But I spot two possible "issues":
> > > > > > >
> > > > > > > 1) the DMA metadata were stored in the headroom of the page, this
> > > > > > > breaks frags coalescing, we need to benchmark it's impact
> > > > > >
> > > > > > Not every page, just the first page of the COMP pages.
> > > > > >
> > > > > > So I think there is no impact.
> > > > >
> > > > > Nope, see this:
> > > > >
> > > > > if (SKB_FRAG_PAGE_ORDER &&
> > > > > 
> > > > > !static_branch_unlikely(&net_high_order_alloc_disable_key)) {
> > > > > /* Avoid direct reclaim but allow kswapd to wake */
> > > > > pfrag->page = alloc_pages((gfp & 
> > > > > ~__GFP_DIRECT_RECLAIM) |
> > > > >   __GFP_COMP | __GFP_NOWARN |
> > > > >   __GFP_NORETRY,
> > > > >   SKB_FRAG_PAGE_ORDER);
> > > > > if (likely(pfrag->page)) {
> > > > > pfrag->size = PAGE_SIZE << 
> > > > > SKB_FRAG_PAGE_ORDER;
> > > > > return true;
> > > > > }
> > > > > }
> > > > >
> > > > > The comp page might be disabled due to the SKB_FRAG_PAGE_ORDER and
> > > > > net_high_order_alloc_disable_key.
> > > >
> > > >
> > > > YES.
> > > >
> > > > But if comp page is disabled. Then we only get one page each time. The 
> > > > pages are
> > > > not contiguous, so we don't have frags coalescing.
> > > >
> > > > If you mean the two pages got from alloc_page may be contiguous. The 
> > > > coalescing
> > > > may then be broken. It's a possibility, but I think the impact will be 
> > > > small.
> > >
> > > Let's have a simple benchmark and see?
> >
> >
> > That is ok.
> >
> > I think you want to know the perf num with big traffic and the comp page
> > disabled.
>
> Yes.


Hi,

Host:
for ((i=0; i < 10; ++i)) do sockperf tp -i 192.168.122.100 -t 1000  -m 
64000& done
Guest:
03:23:12 AM IFACE   rxpck/s   txpck/srxkB/stxkB/s   rxcmp/s 
  txcmp/s  rxmcst/s   %ifutil
03:23:13 AMlo  0.00  0.00  0.00  0.00  0.00 
 0.00  0.00  0.00
03:23:13 AM  ens4  61848.00  1.00 3868036.73  0.58  
0.00  0.00  0.00  0.00

tcpdump:
03:25:01.741563 IP 192.168.122.1.29693 > 192.168.122.100.1: 
UDP, length 64000
03:25:01.741580 IP 192.168.122.1.22239 > 192.168.122.100.1: 
UDP, length 64000
03:25:01.741623 IP 192.168.122.1.22396 > 192.168.122.100.1: 
UDP, length 64000

The Guest CPU util is low, every packet is 64000. But the Host vhost process is
100%. So we can not judge by the traffic or the cpu of the Guest.

So I use the kernel without my patches 0635819decaf9d60e6cacfecfebfabe3cbdddafb.

I want to count the frags coalescing num when the comp page is disabled.

$ sh -x test.sh
+ sysctl -w net.core.high_order_alloc_disable=1
net.core.high_order_alloc_disable = 1
+ sysctl net.core.high_order_alloc_disable
net.core.high_order_alloc_disable = 1
+ sleep 5
+ timeout 5 bpftrace -e 'kprobe: 
skb_coalesce_rx_frag{@[nsecs/1000/1000/1000]=count()}'
Attaching 1 probe...



+ sysctl -w net.core.high_order_alloc_disable=0
net.core.high_order_alloc_disable = 0
+ sysctl net.core.high_order_alloc_disable
net.core.high_order_alloc_disable = 0
+ sleep 5
+ timeout 5 bpftrace -e 'kprobe: 
skb_coalesce_rx_frag{@[nsecs/1000/1000/1000]=count()}'
Attaching 1 probe...


@[356]: 167020
@[361]: 673653
@[359]: 900844
@[360]: 912657
@[358]: 915853
@[357]: 932245


We can see that the skb_coalesce_rx_frag is not called when comp page is 
disabled.
If the comp page is enable, there will be many frags coalescing.

So I think that my change will

Re: [PATCH vhost v13 05/12] virtio_ring: introduce virtqueue_dma_dev()

Re: [PATCH RFC 2/4] vdpa/mlx5: implement .reset_map driver op

[PATCH] MAINTAINERS: Add myself as mlx5_vdpa driver

Re: [PATCH vhost v13 05/12] virtio_ring: introduce virtqueue_dma_dev()

Re: [PATCH vhost v13 05/12] virtio_ring: introduce virtqueue_dma_dev()

Re: [PATCH net v1] virtio_net: Introduce skb_vnet_common_hdr to avoid typecasting

Re: [PATCH net v1] virtio_net: Introduce skb_vnet_common_hdr to avoid typecasting

Re: [PATCH RFC 1/4] vdpa: introduce .reset_map operation callback

Re: [PATCH RFC 4/4] vhost-vdpa: introduce IOTLB_PERSIST backend feature bit

Re: [PATCH RFC 3/4] vhost-vdpa: should restore 1:1 dma mapping before detaching driver

Re: [PATCH RFC 2/4] vdpa/mlx5: implement .reset_map driver op

Re: [PATCH RFC 0/3] vdpa: dedicated descriptor table group

Re: [PATCH RFC 2/3] vhost-vdpa: introduce descriptor group backend feature

Re: [PATCH RFC 0/3] vdpa: dedicated descriptor table group

Re: [PATCH vhost v13 05/12] virtio_ring: introduce virtqueue_dma_dev()

Re: [PATCH RFC 4/4] vhost-vdpa: introduce IOTLB_PERSIST backend feature bit

Re: [PATCH RFC 0/3] vdpa: dedicated descriptor table group

Re: [PATCH RFC 1/4] vdpa: introduce .reset_map operation callback

Re: [PATCH RFC 2/3] vhost-vdpa: introduce descriptor group backend feature

Re: [PATCH vhost v13 05/12] virtio_ring: introduce virtqueue_dma_dev()

Re: [PATCH vhost v13 05/12] virtio_ring: introduce virtqueue_dma_dev()

Re: [PATCH vhost v13 05/12] virtio_ring: introduce virtqueue_dma_dev()

Re: [PATCH vhost v13 05/12] virtio_ring: introduce virtqueue_dma_dev()

Re: [PATCH net v1] virtio_net: Introduce skb_vnet_common_hdr to avoid typecasting

Re: [PATCH] MAINTAINERS: Add myself as mlx5_vdpa driver

Re: [PATCH vhost v13 05/12] virtio_ring: introduce virtqueue_dma_dev()

26 matches

Site Navigation

Mail list logo

Footer information