Re: [PATCH net-next] net: virtio: use eth_hw_addr_set()

2021-10-27 Thread Michael S. Tsirkin
On Tue, Oct 26, 2021 at 10:56:34AM -0700, Jakub Kicinski wrote:
> Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
> of VLANs...") introduced a rbtree for faster Ethernet address look
> up. To maintain netdev->dev_addr in this tree we need to make all
> the writes to it go through appropriate helpers.
> 
> Signed-off-by: Jakub Kicinski 

Acked-by: Michael S. Tsirkin 

> ---
> CC: m...@redhat.com
> CC: jasow...@redhat.com
> CC: virtualization@lists.linux-foundation.org
> ---
>  drivers/net/virtio_net.c | 10 +++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index c501b5974aee..b7f35aff8e82 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -3177,12 +3177,16 @@ static int virtnet_probe(struct virtio_device *vdev)
>   dev->max_mtu = MAX_MTU;
>  
>   /* Configuration may specify what MAC to use.  Otherwise random. */
> - if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC))
> + if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC)) {
> + u8 addr[MAX_ADDR_LEN];
> +
>   virtio_cread_bytes(vdev,
>  offsetof(struct virtio_net_config, mac),
> -dev->dev_addr, dev->addr_len);
> - else
> +addr, dev->addr_len);

Maybe BUG_ON(dev->addr_len > sizeof addr);

here just to make sure we don't overflow addr silently?


> + dev_addr_set(dev, addr);
> + } else {
>   eth_hw_addr_random(dev);
> + }
>  
>   /* Set up our device-specific information */
>   vi = netdev_priv(dev);
> -- 
> 2.31.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next] net: virtio: use eth_hw_addr_set()

2021-10-27 Thread Jason Wang
On Wed, Oct 27, 2021 at 10:45 AM Jason Wang  wrote:
>
> On Wed, Oct 27, 2021 at 1:56 AM Jakub Kicinski  wrote:
> >
> > Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
> > of VLANs...") introduced a rbtree for faster Ethernet address look
> > up. To maintain netdev->dev_addr in this tree we need to make all
> > the writes to it go through appropriate helpers.
>
> I think the title should be "net: virtio: use eth_hw_addr_set()"

I meant "dev_addr_set()" actually.

Thanks

>
> >
> > Signed-off-by: Jakub Kicinski 
> > ---
> > CC: m...@redhat.com
> > CC: jasow...@redhat.com
> > CC: virtualization@lists.linux-foundation.org
> > ---
> >  drivers/net/virtio_net.c | 10 +++---
> >  1 file changed, 7 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index c501b5974aee..b7f35aff8e82 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -3177,12 +3177,16 @@ static int virtnet_probe(struct virtio_device *vdev)
> > dev->max_mtu = MAX_MTU;
> >
> > /* Configuration may specify what MAC to use.  Otherwise random. */
> > -   if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC))
> > +   if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC)) {
> > +   u8 addr[MAX_ADDR_LEN];
> > +
> > virtio_cread_bytes(vdev,
> >offsetof(struct virtio_net_config, mac),
> > -  dev->dev_addr, dev->addr_len);
> > -   else
> > +  addr, dev->addr_len);
> > +   dev_addr_set(dev, addr);
> > +   } else {
> > eth_hw_addr_random(dev);
> > +   }
>
> Do we need to change virtnet_set_mac_address() as well?
>
> Thanks
>
> >
> > /* Set up our device-specific information */
> > vi = netdev_priv(dev);
> > --
> > 2.31.1
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH linux-next v7 4/8] vdpa: Enable user to set mac and mtu of vdpa device

2021-10-27 Thread Stefano Garzarella

On Tue, Oct 26, 2021 at 08:55:15PM +0300, Parav Pandit via Virtualization wrote:

$ vdpa dev add name bar mgmtdev vdpasim_net mac 00:11:22:33:44:55 mtu 9000

$ vdpa dev config show
bar: mac 00:11:22:33:44:55 link up link_announce false mtu 9000

$ vdpa dev config show -jp
{
   "config": {
   "bar": {
   "mac": "00:11:22:33:44:55",
   "link ": "up",
   "link_announce ": false,
   "mtu": 9000,
   }
   }
}

Signed-off-by: Parav Pandit 
Reviewed-by: Eli Cohen 
Acked-by: Jason Wang 

---
changelog:
v5->v6:
- fixed typo of device
v4->v5:
- added comment for checking device capabilities
v3->v4:
- provide config attributes during device addition time
---
drivers/vdpa/ifcvf/ifcvf_main.c  |  3 ++-
drivers/vdpa/mlx5/net/mlx5_vnet.c|  3 ++-
drivers/vdpa/vdpa.c  | 38 ++--
drivers/vdpa/vdpa_sim/vdpa_sim_blk.c |  3 ++-
drivers/vdpa/vdpa_sim/vdpa_sim_net.c |  3 ++-
drivers/vdpa/vdpa_user/vduse_dev.c   |  3 ++-
include/linux/vdpa.h | 17 -
7 files changed, 62 insertions(+), 8 deletions(-)


Reviewed-by: Stefano Garzarella 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next] net: virtio: use eth_hw_addr_set()

2021-10-27 Thread Michael S. Tsirkin
On Wed, Oct 27, 2021 at 03:24:55PM +0800, Jason Wang wrote:
> On Wed, Oct 27, 2021 at 10:45 AM Jason Wang  wrote:
> >
> > On Wed, Oct 27, 2021 at 1:56 AM Jakub Kicinski  wrote:
> > >
> > > Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
> > > of VLANs...") introduced a rbtree for faster Ethernet address look
> > > up. To maintain netdev->dev_addr in this tree we need to make all
> > > the writes to it go through appropriate helpers.
> >
> > I think the title should be "net: virtio: use eth_hw_addr_set()"
> 
> I meant "dev_addr_set()" actually.
> 
> Thanks


Good point, this confused me too. Could be fixed up when applying?

> >
> > >
> > > Signed-off-by: Jakub Kicinski 
> > > ---
> > > CC: m...@redhat.com
> > > CC: jasow...@redhat.com
> > > CC: virtualization@lists.linux-foundation.org
> > > ---
> > >  drivers/net/virtio_net.c | 10 +++---
> > >  1 file changed, 7 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index c501b5974aee..b7f35aff8e82 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -3177,12 +3177,16 @@ static int virtnet_probe(struct virtio_device 
> > > *vdev)
> > > dev->max_mtu = MAX_MTU;
> > >
> > > /* Configuration may specify what MAC to use.  Otherwise random. 
> > > */
> > > -   if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC))
> > > +   if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC)) {
> > > +   u8 addr[MAX_ADDR_LEN];
> > > +
> > > virtio_cread_bytes(vdev,
> > >offsetof(struct virtio_net_config, 
> > > mac),
> > > -  dev->dev_addr, dev->addr_len);
> > > -   else
> > > +  addr, dev->addr_len);
> > > +   dev_addr_set(dev, addr);
> > > +   } else {
> > > eth_hw_addr_random(dev);
> > > +   }
> >
> > Do we need to change virtnet_set_mac_address() as well?
> >
> > Thanks
> >
> > >
> > > /* Set up our device-specific information */
> > > vi = netdev_priv(dev);
> > > --
> > > 2.31.1
> > >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/3] virtio: cache indirect desc for split

2021-10-27 Thread Michael S. Tsirkin
On Wed, Oct 27, 2021 at 02:19:11PM +0800, Xuan Zhuo wrote:
> In the case of using indirect, indirect desc must be allocated and
> released each time, which increases a lot of cpu overhead.
> 
> Here, a cache is added for indirect. If the number of indirect desc to be
> applied for is less than VIRT_QUEUE_CACHE_DESC_NUM, the desc array with
> the size of VIRT_QUEUE_CACHE_DESC_NUM is fixed and cached for reuse.
> 
> Signed-off-by: Xuan Zhuo 
> ---
>  drivers/virtio/virtio.c  |  6 
>  drivers/virtio/virtio_ring.c | 63 ++--
>  include/linux/virtio.h   | 10 ++
>  3 files changed, 70 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 0a5b54034d4b..04bcb74e5b9a 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -431,6 +431,12 @@ bool is_virtio_device(struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(is_virtio_device);
>  
> +void virtio_use_desc_cache(struct virtio_device *dev, bool val)
> +{
> + dev->desc_cache = val;
> +}
> +EXPORT_SYMBOL_GPL(virtio_use_desc_cache);
> +
>  void unregister_virtio_device(struct virtio_device *dev)
>  {
>   int index = dev->index; /* save for after device release */
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index dd95dfd85e98..0b9a8544b0e8 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -117,6 +117,10 @@ struct vring_virtqueue {
>   /* Hint for event idx: already triggered no need to disable. */
>   bool event_triggered;
>  
> + /* Is indirect cache used? */
> + bool use_desc_cache;
> + void *desc_cache_chain;
> +
>   union {
>   /* Available for split ring */
>   struct {

Let's use llist_head and friends please (I am guessing we want
single-linked to avoid writing into indirect buffer after use,
invalidating the cache, but please document why in a comment).  Do not
open-code it.

Also hide all casts in inline wrappers.


> @@ -423,12 +427,47 @@ static unsigned int vring_unmap_one_split(const struct 
> vring_virtqueue *vq,
>   return extra[i].next;
>  }
>  
> -static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
> +#define VIRT_QUEUE_CACHE_DESC_NUM 4

Add an inline wrapper for checking sg versus VIRT_QUEUE_CACHE_DESC_NUM.

> +
> +static void desc_cache_chain_free_split(void *chain)
> +{
> + struct vring_desc *desc;
> +
> + while (chain) {
> + desc = chain;
> + chain = (void *)desc->addr;
> + kfree(desc);
> + }
> +}
> +
> +static void desc_cache_put_split(struct vring_virtqueue *vq,
> +  struct vring_desc *desc, int n)
> +{
> + if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) {
> + desc->addr = (u64)vq->desc_cache_chain;
> + vq->desc_cache_chain = desc;
> + } else {
> + kfree(desc);
> + }
> +}


> +
> +static struct vring_desc *alloc_indirect_split(struct vring_virtqueue *vq,
>  unsigned int total_sg,
>  gfp_t gfp)
>  {
>   struct vring_desc *desc;
> - unsigned int i;
> + unsigned int i, n;
> +
> + if (vq->use_desc_cache && total_sg <= VIRT_QUEUE_CACHE_DESC_NUM) {
> + if (vq->desc_cache_chain) {
> + desc = vq->desc_cache_chain;
> + vq->desc_cache_chain = (void *)desc->addr;
> + goto got;
> + }
> + n = VIRT_QUEUE_CACHE_DESC_NUM;

Hmm. This will allocate more entries than actually used. Why do it?

> + } else {
> + n = total_sg;
> + }
>  
>   /*
>* We require lowmem mappings for the descriptors because
> @@ -437,12 +476,13 @@ static struct vring_desc *alloc_indirect_split(struct 
> virtqueue *_vq,
>*/
>   gfp &= ~__GFP_HIGHMEM;
>  
> - desc = kmalloc_array(total_sg, sizeof(struct vring_desc), gfp);
> + desc = kmalloc_array(n, sizeof(struct vring_desc), gfp);
>   if (!desc)
>   return NULL;
>  
> +got:
>   for (i = 0; i < total_sg; i++)
> - desc[i].next = cpu_to_virtio16(_vq->vdev, i + 1);
> + desc[i].next = cpu_to_virtio16(vq->vq.vdev, i + 1);
>   return desc;
>  }
>  
> @@ -508,7 +548,7 @@ static inline int virtqueue_add_split(struct virtqueue 
> *_vq,
>   head = vq->free_head;
>  
>   if (virtqueue_use_indirect(_vq, total_sg))
> - desc = alloc_indirect_split(_vq, total_sg, gfp);
> + desc = alloc_indirect_split(vq, total_sg, gfp);
>   else {
>   desc = NULL;
>   WARN_ON_ONCE(total_sg > vq->split.vring.num && !vq->indirect);
> @@ -652,7 +692,7 @@ static inline int virtqueue_add_split(struct virtqueue 
> *_vq,
>   }
>  
>   if (indirect)
> - kfree(desc);
> + desc_cache_put_split(vq, desc, total_sg);

Re: [PATCH V3 11/11] vhost: allow userspace to create workers

2021-10-27 Thread Stefan Hajnoczi
On Wed, Oct 27, 2021 at 10:55:04AM +0800, Jason Wang wrote:
> On Tue, Oct 26, 2021 at 11:45 PM Stefan Hajnoczi  wrote:
> >
> > On Tue, Oct 26, 2021 at 01:37:14PM +0800, Jason Wang wrote:
> > >
> > > 在 2021/10/22 下午1:19, Mike Christie 写道:
> > > > This patch allows userspace to create workers and bind them to vqs. You
> > > > can have N workers per dev and also share N workers with M vqs.
> > > >
> > > > Signed-off-by: Mike Christie 
> > >
> > >
> > > A question, who is the best one to determine the binding? Is it the VMM
> > > (Qemu etc) or the management stack? If the latter, it looks to me it's
> > > better to expose this via sysfs?
> >
> > A few options that let the management stack control vhost worker CPU
> > affinity:
> >
> > 1. The management tool opens the vhost device node, calls
> >ioctl(VHOST_SET_VRING_WORKER), sets up CPU affinity, and then passes
> >the fd to the VMM. In this case the VMM is still able to call the
> >ioctl, which may be undesirable from an attack surface perspective.
> 
> Yes, and we can't do post or dynamic configuration afterwards after
> the VM is launched?

Yes, at least it's a little risky for the management stack to keep the
vhost fd open and make ioctl calls while the VMM is using it.

> >
> > 2. The VMM calls ioctl(VHOST_SET_VRING_WORKER) itself and the management
> >tool queries the vq:worker details from the VMM (e.g. a new QEMU QMP
> >query-vhost-workers command similar to query-iothreads). The
> >management tool can then control CPU affinity on the vhost worker
> >threads.
> >
> >(This is how CPU affinity works in QEMU and libvirt today.)
> 
> Then we also need a "bind-vhost-workers" command.

The VMM doesn't but the management tool does.

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V3 11/11] vhost: allow userspace to create workers

2021-10-27 Thread Stefan Hajnoczi
On Tue, Oct 26, 2021 at 11:49:37AM -0500, michael.chris...@oracle.com wrote:
> On 10/26/21 12:37 AM, Jason Wang wrote:
> > Do we need VHOST_VRING_FREE_WORKER? And I wonder if using dedicated ioctls 
> > are better:
> > 
> > VHOST_VRING_NEW/FREE_WORKER
> > VHOST_VRING_ATTACH_WORKER
> 
> 
> We didn't need a free worker, because the kernel handles it for userspace. I
> tried to make it easy for userspace because in some cases it may not be able
> to do syscalls like close on the device. For example if qemu crashes or for
> vhost-scsi we don't do an explicit close during VM shutdown.
> 
> So we start off with the default worker thread that's used by all vqs like we 
> do
> today. Userspace can then override it by creating a new worker. That also 
> unbinds/
> detaches the existing worker and does a put on the workers refcount. We also 
> do a
> put on the worker when we stop using it during device 
> shutdown/closure/release.
> When the worker's refcount goes to zero the kernel deletes it.

Please document the worker (p)id lifetime for the ioctl. Otherwise
userspace doesn't know whether a previously created worker is still
alive.

SSTefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/3] virtio: cache indirect desc for split

2021-10-27 Thread Xuan Zhuo
On Wed, 27 Oct 2021 04:55:16 -0400, Michael S. Tsirkin  wrote:
> On Wed, Oct 27, 2021 at 02:19:11PM +0800, Xuan Zhuo wrote:
> > In the case of using indirect, indirect desc must be allocated and
> > released each time, which increases a lot of cpu overhead.
> >
> > Here, a cache is added for indirect. If the number of indirect desc to be
> > applied for is less than VIRT_QUEUE_CACHE_DESC_NUM, the desc array with
> > the size of VIRT_QUEUE_CACHE_DESC_NUM is fixed and cached for reuse.
> >
> > Signed-off-by: Xuan Zhuo 
> > ---
> >  drivers/virtio/virtio.c  |  6 
> >  drivers/virtio/virtio_ring.c | 63 ++--
> >  include/linux/virtio.h   | 10 ++
> >  3 files changed, 70 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > index 0a5b54034d4b..04bcb74e5b9a 100644
> > --- a/drivers/virtio/virtio.c
> > +++ b/drivers/virtio/virtio.c
> > @@ -431,6 +431,12 @@ bool is_virtio_device(struct device *dev)
> >  }
> >  EXPORT_SYMBOL_GPL(is_virtio_device);
> >
> > +void virtio_use_desc_cache(struct virtio_device *dev, bool val)
> > +{
> > +   dev->desc_cache = val;
> > +}
> > +EXPORT_SYMBOL_GPL(virtio_use_desc_cache);
> > +
> >  void unregister_virtio_device(struct virtio_device *dev)
> >  {
> > int index = dev->index; /* save for after device release */
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index dd95dfd85e98..0b9a8544b0e8 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -117,6 +117,10 @@ struct vring_virtqueue {
> > /* Hint for event idx: already triggered no need to disable. */
> > bool event_triggered;
> >
> > +   /* Is indirect cache used? */
> > +   bool use_desc_cache;
> > +   void *desc_cache_chain;
> > +
> > union {
> > /* Available for split ring */
> > struct {
>
> Let's use llist_head and friends please (I am guessing we want
> single-linked to avoid writing into indirect buffer after use,
> invalidating the cache, but please document why in a comment).  Do not
> open-code it.
>
> Also hide all casts in inline wrappers.
>
>
> > @@ -423,12 +427,47 @@ static unsigned int vring_unmap_one_split(const 
> > struct vring_virtqueue *vq,
> > return extra[i].next;
> >  }
> >
> > -static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
> > +#define VIRT_QUEUE_CACHE_DESC_NUM 4
>
> Add an inline wrapper for checking sg versus VIRT_QUEUE_CACHE_DESC_NUM.
>
> > +
> > +static void desc_cache_chain_free_split(void *chain)
> > +{
> > +   struct vring_desc *desc;
> > +
> > +   while (chain) {
> > +   desc = chain;
> > +   chain = (void *)desc->addr;
> > +   kfree(desc);
> > +   }
> > +}
> > +
> > +static void desc_cache_put_split(struct vring_virtqueue *vq,
> > +struct vring_desc *desc, int n)
> > +{
> > +   if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) {
> > +   desc->addr = (u64)vq->desc_cache_chain;
> > +   vq->desc_cache_chain = desc;
> > +   } else {
> > +   kfree(desc);
> > +   }
> > +}
>
>
> > +
> > +static struct vring_desc *alloc_indirect_split(struct vring_virtqueue *vq,
> >unsigned int total_sg,
> >gfp_t gfp)
> >  {
> > struct vring_desc *desc;
> > -   unsigned int i;
> > +   unsigned int i, n;
> > +
> > +   if (vq->use_desc_cache && total_sg <= VIRT_QUEUE_CACHE_DESC_NUM) {
> > +   if (vq->desc_cache_chain) {
> > +   desc = vq->desc_cache_chain;
> > +   vq->desc_cache_chain = (void *)desc->addr;
> > +   goto got;
> > +   }
> > +   n = VIRT_QUEUE_CACHE_DESC_NUM;
>
> Hmm. This will allocate more entries than actually used. Why do it?

If total_sg here is 2, we only allocate two desc here, but if total_sg is 3
later, it will be difficult to use the desc allocated this time.

So if total_sg is less than or equal to 4, a desc array of size 4 is allocated,
so that subsequent desc arrays can be reused when total_sg is less than or
equal to 4. If total_sg is greater than 4, use kmalloc_array to allocate
directly without using the cache.

Thanks.

>
> > +   } else {
> > +   n = total_sg;
> > +   }
> >
> > /*
> >  * We require lowmem mappings for the descriptors because
> > @@ -437,12 +476,13 @@ static struct vring_desc *alloc_indirect_split(struct 
> > virtqueue *_vq,
> >  */
> > gfp &= ~__GFP_HIGHMEM;
> >
> > -   desc = kmalloc_array(total_sg, sizeof(struct vring_desc), gfp);
> > +   desc = kmalloc_array(n, sizeof(struct vring_desc), gfp);
> > if (!desc)
> > return NULL;
> >
> > +got:
> > for (i = 0; i < total_sg; i++)
> > -   desc[i].next = cpu_to_virtio16(_vq->vdev, i + 1);
> > +   desc[i].next = cpu_to_virtio16(vq->vq.vdev, i + 1);
> > return desc;
> >  }
> >
> > @@ -508,7 +548,

Re: drm/virtio: not pin pages on demand

2021-10-27 Thread Gerd Hoffmann
[ Cc'ing Gurchetan Singh ]

> Can we follow up on this issue?
> 
> The main pain point with your suggestion is the fact,
> that it will cause VirGL protocol breakage and we would
> like to avoid this.
> 
> Extending execbuffer ioctl and create_resource ioctl is
> more convenient than having the protocol broken.

Do you know at resource creation time whenever you need cpu access
or not?  IOW can we make that a resource property, so we don't have
pass around lists of objects on each and every execbuffer ioctl?

> Blob resources is not a solution, too, because QEMU doesn't
> support blob resources right now.
> 
> In ideal solution when both QEMU and crosvm support blob resources
> we can switch to blob resources for textures.

That'll only happen if someone works on it.
I will not be able to do that though.

> I would like to introduce changes gradually and not make a protocol
> breakage.

Well, opengl switching to blob resources is a protocol change anyway.
That doesn't imply things will break though.  We have capsets etc to
extend the protocol while maintaining backward compatibility.

> What do you think about that?

I still think that switching to blob resources would be the better
solution.  Yes, it's alot of work and not something which helps
short-term.  But adding a new API as interim solution isn't great
either.  So, not sure.  Chia-I Wu?  Gurchetan Singh?


While being at it:  Chia-I Wu & Gurchetan Singh, could you help
reviewing virtio-gpu kernel patches?  I think you both have a better
view on the big picture (with both mesa and virglrenderer) than I have.
Also for the crosvm side of things.  The procedure for that would be to
send a patch adding yourself to the virtio-gpu section of the
MAINTAINERS file, so scripts/get_maintainer.pl will Cc you on patches
submitted.

thanks,
  Gerd

> 
> Maksym
> 
> 
> On 10/22/21 10:44 AM, Maksym Wezdecki wrote:
> 
> > Once again with all lists and receivers. I'm sorry for that.
> >
> > On 10/21/21 6:42 PM, Chia-I Wu wrote:
> >> On Thu, Oct 21, 2021 at 4:52 AM Gerd Hoffmann  wrote:
> >>> On Thu, Oct 21, 2021 at 11:55:47AM +0200, Maksym Wezdecki wrote:
>  I get your point. However, we need to make resource_create ioctl,
>  in order to create corresponding resource on the host.
> >>> That used to be the case but isn't true any more with the new
> >>> blob resources.  virglrenderer allows to create gpu objects
> >>> via execbuffer.  Those gpu objects can be linked to a
> >>> virtio-gpu resources, but it's optional.  In your case you
> >>> would do that only for your staging buffer.  The other textures
> >>> (which you fill with a host-side copy from the staging buffer)
> >>> do not need a virtio-gpu resource in the first place.
> >> That's however a breaking change to the virgl protocol.  All virgl
> >> commands expect res ids rather than blob ids.
> >>
> >> Using VIRTGPU_BLOB_MEM_HOST3D could in theory work.  But there are a
> >> few occasions where virglrenderer expects there to be guest storage.
> >> There are also readbacks that we need to support.  We might end up
> >> using VIRTGPU_BLOB_MEM_HOST3D_GUEST, where pin-on-demand is still
> >> desirable.
> >>
> >> For this patch, I think the uapi change can be simplified.  It can be
> >> a new param plus a new field in drm_virtgpu_execbuffer
> >>
> >>   __u64 bo_flags; /* pointer to an array of size num_bo_handles, or NULL */
> >>
> >> The other changes do not seem needed.
> > My first approach was the same, as you mentioned. However, having "__u64 
> > bo_flags"
> > needs a for loop, where only few of the bo-s needs to be pinned - this has
> > performance implications. I would rather pass bo handles that should be 
> > pinned than
> > having a lot of flags, where only 1-2 bos needs to be pinned.
> >
> >>> take care,
> >>>   Gerd
> >>>

-- 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


vDPA bus driver selection

2021-10-27 Thread Stefano Garzarella
Hi folks,
I was trying to understand if we have a way to specify which vDPA bus
driver (e.g. vhost-vdpa, virtio-vdpa) a device should use.
IIUC we don't have it, and the first registered driver is used when a
new device is registered.

I was thinking if it makes sense to extend the management API to
specify which bus driver to use for a device. A use case could be for
example a single host handling VMs and bare-metal containers, so we
would have both virtio-vdpa and vhost-vdpa loaded and we want to
attach some devices to VMs through vhost-vdpa and others to containers
through virtio-vdpa.

What do you think?

I can prepare an RFC with some code, the idea is to use the .match
callback of "struct bus_type" to use a driver instead of the other,
and extend netlink API to specify the vDPA bus driver name to use.

Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next] net: virtio: use eth_hw_addr_set()

2021-10-27 Thread Michael S. Tsirkin
On Wed, Oct 27, 2021 at 06:26:40AM -0700, Jakub Kicinski wrote:
> On Wed, 27 Oct 2021 03:23:17 -0400 Michael S. Tsirkin wrote:
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index c501b5974aee..b7f35aff8e82 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -3177,12 +3177,16 @@ static int virtnet_probe(struct virtio_device 
> > > *vdev)
> > >   dev->max_mtu = MAX_MTU;
> > >  
> > >   /* Configuration may specify what MAC to use.  Otherwise random. */
> > > - if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC))
> > > + if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC)) {
> > > + u8 addr[MAX_ADDR_LEN];
> > > +
> > >   virtio_cread_bytes(vdev,
> > >  offsetof(struct virtio_net_config, mac),
> > > -dev->dev_addr, dev->addr_len);
> > > - else
> > > +addr, dev->addr_len);  
> > 
> > Maybe BUG_ON(dev->addr_len > sizeof addr);
> > 
> > here just to make sure we don't overflow addr silently?
> 
> Since I need to post a v2 and we're talking... can I just use
> eth_hw_addr_set() instead? AFAICT netdev is always allocated with 
> alloc_etherdev_mq() and ->addr_len never changed. Plus there is 
> a number of Ethernet address helpers used so ETH_ALEN is kind of 
> already assumed.


Right. Sure, just document this in the commit log pls.

> > > + dev_addr_set(dev, addr);
> > > + } else {
> > >   eth_hw_addr_random(dev);
> > > + }
> > >  
> > >   /* Set up our device-specific information */
> > >   vi = netdev_priv(dev);

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: vDPA bus driver selection

2021-10-27 Thread Parav Pandit via Virtualization
Hi Stefano,

> From: Stefano Garzarella 
> Sent: Wednesday, October 27, 2021 8:04 PM
> 
> Hi folks,
> I was trying to understand if we have a way to specify which vDPA bus driver
> (e.g. vhost-vdpa, virtio-vdpa) a device should use.
> IIUC we don't have it, and the first registered driver is used when a new 
> device
> is registered.
> 
> I was thinking if it makes sense to extend the management API to specify which
> bus driver to use for a device. A use case could be for example a single host
> handling VMs and bare-metal containers, so we would have both virtio-vdpa
> and vhost-vdpa loaded and we want to attach some devices to VMs through
> vhost-vdpa and others to containers through virtio-vdpa.
> 
> What do you think?
> 
One option is, user keeps the drivers_autoprobe disabled for the vdpa bus using,

$ vdpa/vdpa dev add mgmtdev vdpasim_net name vdpa0 mac 00:11:22:33:44:55
$ echo 0 > /sys/bus/vdpa/drivers_autoprobe

And after vdpa device creation, it manually binds to the desired driver such as,

$ echo vdpa0 > /sys/bus/vdpa/drivers/virtio_vdpa/bind
Or
$ echo vdpa0 > /sys/bus/vdpa/drivers/vhost_vdpa/bind

In an case of VDUSE, it makes more sense to bind to the one of the above driver 
after user space has connected the use space backend to the kernel device.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next v2] net: virtio: use eth_hw_addr_set()

2021-10-27 Thread Michael S. Tsirkin
On Wed, Oct 27, 2021 at 08:20:12AM -0700, Jakub Kicinski wrote:
> Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
> of VLANs...") introduced a rbtree for faster Ethernet address look
> up. To maintain netdev->dev_addr in this tree we need to make all
> the writes to it go through appropriate helpers.
> 
> Even though the current code uses dev->addr_len the we can switch
> to eth_hw_addr_set() instead of dev_addr_set(). The netdev is
> always allocated by alloc_etherdev_mq() and there are at least two
> places which assume Ethernet address:
>  - the line below calling eth_hw_addr_random()
>  - virtnet_set_mac_address() -> eth_commit_mac_addr_change()
> 
> Signed-off-by: Jakub Kicinski 

Acked-by: Michael S. Tsirkin 

> ---
> v2: - actually switch to eth_hw_addr_set() not dev_addr_set()
> - resize the buffer to ETH_ALEN
> - pass ETH_ALEN instead of dev->dev_addr to virtio_cread_bytes()
> 
> CC: m...@redhat.com
> CC: jasow...@redhat.com
> CC: virtualization@lists.linux-foundation.org
> ---
>  drivers/net/virtio_net.c | 10 +++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index c501b5974aee..cc79343cd220 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -3177,12 +3177,16 @@ static int virtnet_probe(struct virtio_device *vdev)
>   dev->max_mtu = MAX_MTU;
>  
>   /* Configuration may specify what MAC to use.  Otherwise random. */
> - if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC))
> + if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC)) {
> + u8 addr[ETH_ALEN];
> +
>   virtio_cread_bytes(vdev,
>  offsetof(struct virtio_net_config, mac),
> -dev->dev_addr, dev->addr_len);
> - else
> +addr, ETH_ALEN);
> + eth_hw_addr_set(dev, addr);
> + } else {
>   eth_hw_addr_random(dev);
> + }
>  
>   /* Set up our device-specific information */
>   vi = netdev_priv(dev);
> -- 
> 2.31.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: vDPA bus driver selection

2021-10-27 Thread Stefano Garzarella

Hi Parav,

On Wed, Oct 27, 2021 at 03:21:15PM +, Parav Pandit wrote:

Hi Stefano,


From: Stefano Garzarella 
Sent: Wednesday, October 27, 2021 8:04 PM

Hi folks,
I was trying to understand if we have a way to specify which vDPA bus driver
(e.g. vhost-vdpa, virtio-vdpa) a device should use.
IIUC we don't have it, and the first registered driver is used when a new device
is registered.

I was thinking if it makes sense to extend the management API to specify which
bus driver to use for a device. A use case could be for example a single host
handling VMs and bare-metal containers, so we would have both virtio-vdpa
and vhost-vdpa loaded and we want to attach some devices to VMs through
vhost-vdpa and others to containers through virtio-vdpa.

What do you think?


One option is, user keeps the drivers_autoprobe disabled for the vdpa bus using,

$ vdpa/vdpa dev add mgmtdev vdpasim_net name vdpa0 mac 00:11:22:33:44:55
$ echo 0 > /sys/bus/vdpa/drivers_autoprobe

And after vdpa device creation, it manually binds to the desired driver such as,

$ echo vdpa0 > /sys/bus/vdpa/drivers/virtio_vdpa/bind
Or
$ echo vdpa0 > /sys/bus/vdpa/drivers/vhost_vdpa/bind


Cool, I didn't know that. This is very useful, but do you think it might 
be better to integrate it with the netlink API and specify at creation 
which bus driver to use?




In an case of VDUSE, it makes more sense to bind to the one of the 
above driver after user space has connected the use space backend to 
the kernel device.


Yep, make sense.

Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: vDPA bus driver selection

2021-10-27 Thread Parav Pandit via Virtualization
Hi Stefano,

> From: Stefano Garzarella 
> Sent: Wednesday, October 27, 2021 9:17 PM
> To: Parav Pandit 
> Cc: Jason Wang ; Michael Tsirkin ;
> Linux Virtualization ; Eli Cohen
> 
> Subject: Re: vDPA bus driver selection
> 
> Hi Parav,
> 
> On Wed, Oct 27, 2021 at 03:21:15PM +, Parav Pandit wrote:
> >Hi Stefano,
> >
> >> From: Stefano Garzarella 
> >> Sent: Wednesday, October 27, 2021 8:04 PM
> >>
> >> Hi folks,
> >> I was trying to understand if we have a way to specify which vDPA bus
> >> driver (e.g. vhost-vdpa, virtio-vdpa) a device should use.
> >> IIUC we don't have it, and the first registered driver is used when a
> >> new device is registered.
> >>
> >> I was thinking if it makes sense to extend the management API to
> >> specify which bus driver to use for a device. A use case could be for
> >> example a single host handling VMs and bare-metal containers, so we
> >> would have both virtio-vdpa and vhost-vdpa loaded and we want to
> >> attach some devices to VMs through vhost-vdpa and others to containers
> through virtio-vdpa.
> >>
> >> What do you think?
> >>
> >One option is, user keeps the drivers_autoprobe disabled for the vdpa
> >bus using,
> >
> >$ vdpa/vdpa dev add mgmtdev vdpasim_net name vdpa0 mac
> >00:11:22:33:44:55 $ echo 0 > /sys/bus/vdpa/drivers_autoprobe
> >
> >And after vdpa device creation, it manually binds to the desired driver
> >such as,
> >
> >$ echo vdpa0 > /sys/bus/vdpa/drivers/virtio_vdpa/bind
> >Or
> >$ echo vdpa0 > /sys/bus/vdpa/drivers/vhost_vdpa/bind
> 
> Cool, I didn't know that. This is very useful, but do you think it might be 
> better
> to integrate it with the netlink API and specify at creation which bus driver 
> to
> use?
I think it is useful; for vduse case we need the ability to say "none" as well 
and when nothing specified it should be default driver.

More than netlink, I think we need the ability in the core kernel to make this 
choice.
I haven't explored what is already available to make that happen.
If/once its available, I am sure it has more users than just vdpa.

> 
> >
> >In an case of VDUSE, it makes more sense to bind to the one of the
> >above driver after user space has connected the use space backend to
> >the kernel device.
> 
> Yep, make sense.
> 
> Thanks,
> Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/3] virtio: cache indirect desc for split

2021-10-27 Thread Dongli Zhang



On 10/26/21 11:19 PM, Xuan Zhuo wrote:
> In the case of using indirect, indirect desc must be allocated and
> released each time, which increases a lot of cpu overhead.
> 
> Here, a cache is added for indirect. If the number of indirect desc to be
> applied for is less than VIRT_QUEUE_CACHE_DESC_NUM, the desc array with
> the size of VIRT_QUEUE_CACHE_DESC_NUM is fixed and cached for reuse.
> 
> Signed-off-by: Xuan Zhuo 
> ---
>  drivers/virtio/virtio.c  |  6 
>  drivers/virtio/virtio_ring.c | 63 ++--
>  include/linux/virtio.h   | 10 ++
>  3 files changed, 70 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 0a5b54034d4b..04bcb74e5b9a 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -431,6 +431,12 @@ bool is_virtio_device(struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(is_virtio_device);
>  
> +void virtio_use_desc_cache(struct virtio_device *dev, bool val)
> +{
> + dev->desc_cache = val;
> +}
> +EXPORT_SYMBOL_GPL(virtio_use_desc_cache);
> +
>  void unregister_virtio_device(struct virtio_device *dev)
>  {
>   int index = dev->index; /* save for after device release */
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index dd95dfd85e98..0b9a8544b0e8 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -117,6 +117,10 @@ struct vring_virtqueue {
>   /* Hint for event idx: already triggered no need to disable. */
>   bool event_triggered;
>  
> + /* Is indirect cache used? */
> + bool use_desc_cache;
> + void *desc_cache_chain;
> +
>   union {
>   /* Available for split ring */
>   struct {
> @@ -423,12 +427,47 @@ static unsigned int vring_unmap_one_split(const struct 
> vring_virtqueue *vq,
>   return extra[i].next;
>  }
>  
> -static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
> +#define VIRT_QUEUE_CACHE_DESC_NUM 4
> +
> +static void desc_cache_chain_free_split(void *chain)
> +{
> + struct vring_desc *desc;
> +
> + while (chain) {
> + desc = chain;
> + chain = (void *)desc->addr;
> + kfree(desc);
> + }
> +}
> +
> +static void desc_cache_put_split(struct vring_virtqueue *vq,
> +  struct vring_desc *desc, int n)
> +{
> + if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) {
> + desc->addr = (u64)vq->desc_cache_chain;
> + vq->desc_cache_chain = desc;
> + } else {
> + kfree(desc);
> + }
> +}
> +
> +static struct vring_desc *alloc_indirect_split(struct vring_virtqueue *vq,
>  unsigned int total_sg,
>  gfp_t gfp)
>  {
>   struct vring_desc *desc;
> - unsigned int i;
> + unsigned int i, n;
> +
> + if (vq->use_desc_cache && total_sg <= VIRT_QUEUE_CACHE_DESC_NUM) {
> + if (vq->desc_cache_chain) {
> + desc = vq->desc_cache_chain;
> + vq->desc_cache_chain = (void *)desc->addr;
> + goto got;
> + }
> + n = VIRT_QUEUE_CACHE_DESC_NUM;

How about to make the VIRT_QUEUE_CACHE_DESC_NUM configurable (at least during
driver probing) unless there is a reason that the default value is 4.

Thank you very much!

Dongli Zhang



> + } else {
> + n = total_sg;
> + }
>  
>   /*
>* We require lowmem mappings for the descriptors because
> @@ -437,12 +476,13 @@ static struct vring_desc *alloc_indirect_split(struct 
> virtqueue *_vq,
>*/
>   gfp &= ~__GFP_HIGHMEM;
>  
> - desc = kmalloc_array(total_sg, sizeof(struct vring_desc), gfp);
> + desc = kmalloc_array(n, sizeof(struct vring_desc), gfp);
>   if (!desc)
>   return NULL;
>  
> +got:
>   for (i = 0; i < total_sg; i++)
> - desc[i].next = cpu_to_virtio16(_vq->vdev, i + 1);
> + desc[i].next = cpu_to_virtio16(vq->vq.vdev, i + 1);
>   return desc;
>  }
>  
> @@ -508,7 +548,7 @@ static inline int virtqueue_add_split(struct virtqueue 
> *_vq,
>   head = vq->free_head;
>  
>   if (virtqueue_use_indirect(_vq, total_sg))
> - desc = alloc_indirect_split(_vq, total_sg, gfp);
> + desc = alloc_indirect_split(vq, total_sg, gfp);
>   else {
>   desc = NULL;
>   WARN_ON_ONCE(total_sg > vq->split.vring.num && !vq->indirect);
> @@ -652,7 +692,7 @@ static inline int virtqueue_add_split(struct virtqueue 
> *_vq,
>   }
>  
>   if (indirect)
> - kfree(desc);
> + desc_cache_put_split(vq, desc, total_sg);
>  
>   END_USE(vq);
>   return -ENOMEM;
> @@ -717,7 +757,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, 
> unsigned int head,
>   if (vq->indirect) {
>   struct vring_desc *indir_desc =
>   

Re: [PATCH 1/3] virtio: cache indirect desc for split

2021-10-27 Thread Michael S. Tsirkin
On Wed, Oct 27, 2021 at 02:19:11PM +0800, Xuan Zhuo wrote:
> In the case of using indirect, indirect desc must be allocated and
> released each time, which increases a lot of cpu overhead.
> 
> Here, a cache is added for indirect. If the number of indirect desc to be
> applied for is less than VIRT_QUEUE_CACHE_DESC_NUM, the desc array with
> the size of VIRT_QUEUE_CACHE_DESC_NUM is fixed and cached for reuse.
> 
> Signed-off-by: Xuan Zhuo 
> ---
>  drivers/virtio/virtio.c  |  6 
>  drivers/virtio/virtio_ring.c | 63 ++--
>  include/linux/virtio.h   | 10 ++
>  3 files changed, 70 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 0a5b54034d4b..04bcb74e5b9a 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -431,6 +431,12 @@ bool is_virtio_device(struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(is_virtio_device);
>  
> +void virtio_use_desc_cache(struct virtio_device *dev, bool val)
> +{
> + dev->desc_cache = val;
> +}
> +EXPORT_SYMBOL_GPL(virtio_use_desc_cache);
> +
>  void unregister_virtio_device(struct virtio_device *dev)
>  {
>   int index = dev->index; /* save for after device release */
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index dd95dfd85e98..0b9a8544b0e8 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -117,6 +117,10 @@ struct vring_virtqueue {
>   /* Hint for event idx: already triggered no need to disable. */
>   bool event_triggered;
>  
> + /* Is indirect cache used? */
> + bool use_desc_cache;
> + void *desc_cache_chain;
> +
>   union {
>   /* Available for split ring */
>   struct {
> @@ -423,12 +427,47 @@ static unsigned int vring_unmap_one_split(const struct 
> vring_virtqueue *vq,
>   return extra[i].next;
>  }
>  
> -static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
> +#define VIRT_QUEUE_CACHE_DESC_NUM 4
> +
> +static void desc_cache_chain_free_split(void *chain)
> +{
> + struct vring_desc *desc;
> +
> + while (chain) {
> + desc = chain;
> + chain = (void *)desc->addr;
> + kfree(desc);
> + }
> +}
> +
> +static void desc_cache_put_split(struct vring_virtqueue *vq,
> +  struct vring_desc *desc, int n)
> +{
> + if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) {
> + desc->addr = (u64)vq->desc_cache_chain;
> + vq->desc_cache_chain = desc;
> + } else {
> + kfree(desc);
> + }
> +}
> +


So I have a question here. What happens if we just do:

if (n <= VIRT_QUEUE_CACHE_DESC_NUM) {
return kmem_cache_alloc(VIRT_QUEUE_CACHE_DESC_NUM * sizeof desc, gfp)
} else {
return kmalloc_arrat(n, sizeof desc, gfp)
}

A small change and won't we reap most performance benefits?

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: vDPA bus driver selection

2021-10-27 Thread Michael S. Tsirkin
On Wed, Oct 27, 2021 at 04:33:50PM +0200, Stefano Garzarella wrote:
> Hi folks,
> I was trying to understand if we have a way to specify which vDPA bus
> driver (e.g. vhost-vdpa, virtio-vdpa) a device should use.
> IIUC we don't have it, and the first registered driver is used when a
> new device is registered.
> 
> I was thinking if it makes sense to extend the management API to
> specify which bus driver to use for a device. A use case could be for
> example a single host handling VMs and bare-metal containers, so we
> would have both virtio-vdpa and vhost-vdpa loaded and we want to
> attach some devices to VMs through vhost-vdpa and others to containers
> through virtio-vdpa.
> 
> What do you think?
> 
> I can prepare an RFC with some code, the idea is to use the .match
> callback of "struct bus_type" to use a driver instead of the other,
> and extend netlink API to specify the vDPA bus driver name to use.
> 
> Thanks,
> Stefano

So I think that doing this at create time is somewhat limited.
For example a great way to do migration could be to unbind
device from VM then bind virtio on the host to it, then
bind macvtap to that.

Ideas on how to allow that?

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/3] virtio: cache indirect desc for split

2021-10-27 Thread Michael S. Tsirkin
On Wed, Oct 27, 2021 at 09:33:46AM -0700, Dongli Zhang wrote:
> 
> 
> On 10/26/21 11:19 PM, Xuan Zhuo wrote:
> > In the case of using indirect, indirect desc must be allocated and
> > released each time, which increases a lot of cpu overhead.
> > 
> > Here, a cache is added for indirect. If the number of indirect desc to be
> > applied for is less than VIRT_QUEUE_CACHE_DESC_NUM, the desc array with
> > the size of VIRT_QUEUE_CACHE_DESC_NUM is fixed and cached for reuse.
> > 
> > Signed-off-by: Xuan Zhuo 
> > ---
> >  drivers/virtio/virtio.c  |  6 
> >  drivers/virtio/virtio_ring.c | 63 ++--
> >  include/linux/virtio.h   | 10 ++
> >  3 files changed, 70 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > index 0a5b54034d4b..04bcb74e5b9a 100644
> > --- a/drivers/virtio/virtio.c
> > +++ b/drivers/virtio/virtio.c
> > @@ -431,6 +431,12 @@ bool is_virtio_device(struct device *dev)
> >  }
> >  EXPORT_SYMBOL_GPL(is_virtio_device);
> >  
> > +void virtio_use_desc_cache(struct virtio_device *dev, bool val)
> > +{
> > +   dev->desc_cache = val;
> > +}
> > +EXPORT_SYMBOL_GPL(virtio_use_desc_cache);
> > +
> >  void unregister_virtio_device(struct virtio_device *dev)
> >  {
> > int index = dev->index; /* save for after device release */
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index dd95dfd85e98..0b9a8544b0e8 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -117,6 +117,10 @@ struct vring_virtqueue {
> > /* Hint for event idx: already triggered no need to disable. */
> > bool event_triggered;
> >  
> > +   /* Is indirect cache used? */
> > +   bool use_desc_cache;
> > +   void *desc_cache_chain;
> > +
> > union {
> > /* Available for split ring */
> > struct {
> > @@ -423,12 +427,47 @@ static unsigned int vring_unmap_one_split(const 
> > struct vring_virtqueue *vq,
> > return extra[i].next;
> >  }
> >  
> > -static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
> > +#define VIRT_QUEUE_CACHE_DESC_NUM 4
> > +
> > +static void desc_cache_chain_free_split(void *chain)
> > +{
> > +   struct vring_desc *desc;
> > +
> > +   while (chain) {
> > +   desc = chain;
> > +   chain = (void *)desc->addr;
> > +   kfree(desc);
> > +   }
> > +}
> > +
> > +static void desc_cache_put_split(struct vring_virtqueue *vq,
> > +struct vring_desc *desc, int n)
> > +{
> > +   if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) {
> > +   desc->addr = (u64)vq->desc_cache_chain;
> > +   vq->desc_cache_chain = desc;
> > +   } else {
> > +   kfree(desc);
> > +   }
> > +}
> > +
> > +static struct vring_desc *alloc_indirect_split(struct vring_virtqueue *vq,
> >unsigned int total_sg,
> >gfp_t gfp)
> >  {
> > struct vring_desc *desc;
> > -   unsigned int i;
> > +   unsigned int i, n;
> > +
> > +   if (vq->use_desc_cache && total_sg <= VIRT_QUEUE_CACHE_DESC_NUM) {
> > +   if (vq->desc_cache_chain) {
> > +   desc = vq->desc_cache_chain;
> > +   vq->desc_cache_chain = (void *)desc->addr;
> > +   goto got;
> > +   }
> > +   n = VIRT_QUEUE_CACHE_DESC_NUM;
> 
> How about to make the VIRT_QUEUE_CACHE_DESC_NUM configurable (at least during
> driver probing) unless there is a reason that the default value is 4.
> 
> Thank you very much!
> 
> Dongli Zhang


I would start with some experimentation showing that it actually makes a
difference in performance.

> 
> 
> > +   } else {
> > +   n = total_sg;
> > +   }
> >  
> > /*
> >  * We require lowmem mappings for the descriptors because
> > @@ -437,12 +476,13 @@ static struct vring_desc *alloc_indirect_split(struct 
> > virtqueue *_vq,
> >  */
> > gfp &= ~__GFP_HIGHMEM;
> >  
> > -   desc = kmalloc_array(total_sg, sizeof(struct vring_desc), gfp);
> > +   desc = kmalloc_array(n, sizeof(struct vring_desc), gfp);
> > if (!desc)
> > return NULL;
> >  
> > +got:
> > for (i = 0; i < total_sg; i++)
> > -   desc[i].next = cpu_to_virtio16(_vq->vdev, i + 1);
> > +   desc[i].next = cpu_to_virtio16(vq->vq.vdev, i + 1);
> > return desc;
> >  }
> >  
> > @@ -508,7 +548,7 @@ static inline int virtqueue_add_split(struct virtqueue 
> > *_vq,
> > head = vq->free_head;
> >  
> > if (virtqueue_use_indirect(_vq, total_sg))
> > -   desc = alloc_indirect_split(_vq, total_sg, gfp);
> > +   desc = alloc_indirect_split(vq, total_sg, gfp);
> > else {
> > desc = NULL;
> > WARN_ON_ONCE(total_sg > vq->split.vring.num && !vq->indirect);
> > @@ -652,7 +692,7 @@ static inline int virtqueue_add_split(struct virtqueue 
> > *_vq,
> > }
> >  
> > if (in

Re: [PATCH v3 4/4] virtio-blk: Use blk_validate_block_size() to validate block size

2021-10-27 Thread Michael S. Tsirkin
On Tue, Oct 26, 2021 at 10:40:15PM +0800, Xie Yongji wrote:
> The block layer can't support a block size larger than
> page size yet. And a block size that's too small or
> not a power of two won't work either. If a misconfigured
> device presents an invalid block size in configuration space,
> it will result in the kernel crash something like below:
> 
> [  506.154324] BUG: kernel NULL pointer dereference, address: 0008
> [  506.160416] RIP: 0010:create_empty_buffers+0x24/0x100
> [  506.174302] Call Trace:
> [  506.174651]  create_page_buffers+0x4d/0x60
> [  506.175207]  block_read_full_page+0x50/0x380
> [  506.175798]  ? __mod_lruvec_page_state+0x60/0xa0
> [  506.176412]  ? __add_to_page_cache_locked+0x1b2/0x390
> [  506.177085]  ? blkdev_direct_IO+0x4a0/0x4a0
> [  506.177644]  ? scan_shadow_nodes+0x30/0x30
> [  506.178206]  ? lru_cache_add+0x42/0x60
> [  506.178716]  do_read_cache_page+0x695/0x740
> [  506.179278]  ? read_part_sector+0xe0/0xe0
> [  506.179821]  read_part_sector+0x36/0xe0
> [  506.180337]  adfspart_check_ICS+0x32/0x320
> [  506.180890]  ? snprintf+0x45/0x70
> [  506.181350]  ? read_part_sector+0xe0/0xe0
> [  506.181906]  bdev_disk_changed+0x229/0x5c0
> [  506.182483]  blkdev_get_whole+0x6d/0x90
> [  506.183013]  blkdev_get_by_dev+0x122/0x2d0
> [  506.183562]  device_add_disk+0x39e/0x3c0
> [  506.184472]  virtblk_probe+0x3f8/0x79b [virtio_blk]
> [  506.185461]  virtio_dev_probe+0x15e/0x1d0 [virtio]
> 
> So let's use a block layer helper to validate the block size.
> 
> Signed-off-by: Xie Yongji 

Acked-by: Michael S. Tsirkin 


Please merge through the block tree because of the
dependency.

Jens can you pick this up?

> ---
>  drivers/block/virtio_blk.c | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 303caf2d17d0..fd086179f980 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -815,9 +815,17 @@ static int virtblk_probe(struct virtio_device *vdev)
>   err = virtio_cread_feature(vdev, VIRTIO_BLK_F_BLK_SIZE,
>  struct virtio_blk_config, blk_size,
>  &blk_size);
> - if (!err)
> + if (!err) {
> + err = blk_validate_block_size(blk_size);
> + if (err) {
> + dev_err(&vdev->dev,
> + "virtio_blk: invalid block size: 0x%x\n",
> + blk_size);
> + goto out_cleanup_disk;
> + }
> +
>   blk_queue_logical_block_size(q, blk_size);
> - else
> + } else
>   blk_size = queue_logical_block_size(q);
>  
>   /* Use topology information if available */
> -- 
> 2.11.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vhost: Make use of the helper macro kthread_run()

2021-10-27 Thread Michael S. Tsirkin
On Thu, Oct 21, 2021 at 04:44:06PM +0800, Cai Huoqing wrote:
> Repalce kthread_create/wake_up_process() with kthread_run()
> to simplify the code.
> 
> Signed-off-by: Cai Huoqing 

Pls check how this interacts with Mike Christie's patches.
Pls fix up the typo in the commit log.

> ---
>  drivers/vhost/vhost.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 59edb5a1ffe2..e67bd5603b5f 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -595,15 +595,15 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
>  
>   dev->kcov_handle = kcov_common_handle();
>   if (dev->use_worker) {
> - worker = kthread_create(vhost_worker, dev,
> - "vhost-%d", current->pid);
> + /* avoid contributing to loadavg */

doesn't this comment have any value anymore?

> + worker = kthread_run(vhost_worker, dev,
> +  "vhost-%d", current->pid);
>   if (IS_ERR(worker)) {
>   err = PTR_ERR(worker);
>   goto err_worker;
>   }
>  
>   dev->worker = worker;
> - wake_up_process(worker); /* avoid contributing to loadavg */
>  
>   err = vhost_attach_cgroups(dev);
>   if (err)
> -- 
> 2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[GIT PULL] virtio: last minute fixes

2021-10-27 Thread Michael S. Tsirkin
The following changes since commit 64222515138e43da1fcf288f0289ef1020427b87:

  Merge tag 'drm-fixes-2021-10-22' of git://anongit.freedesktop.org/drm/drm 
(2021-10-21 19:06:08 -1000)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

for you to fetch changes up to 890d33561337ffeba0d8ba42517e71288cfee2b6:

  virtio-ring: fix DMA metadata flags (2021-10-27 15:54:34 -0400)


virtio: last minute fixes

A couple of fixes that seem important enough to pick at the last moment.

Signed-off-by: Michael S. Tsirkin 


Vincent Whitchurch (1):
  virtio-ring: fix DMA metadata flags

Xie Yongji (2):
  vduse: Disallow injecting interrupt before DRIVER_OK is set
  vduse: Fix race condition between resetting and irq injecting

 drivers/vdpa/vdpa_user/vduse_dev.c | 29 +
 drivers/virtio/virtio_ring.c   |  2 +-
 2 files changed, 26 insertions(+), 5 deletions(-)

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vhost: Make use of the helper macro kthread_run()

2021-10-27 Thread Mike Christie
On 10/27/21 3:02 PM, Michael S. Tsirkin wrote:
> On Thu, Oct 21, 2021 at 04:44:06PM +0800, Cai Huoqing wrote:
>> Repalce kthread_create/wake_up_process() with kthread_run()
>> to simplify the code.
>>
>> Signed-off-by: Cai Huoqing 
> 
> Pls check how this interacts with Mike Christie's patches.
> Pls fix up the typo in the commit log.
> 

Hi Cai,

We probably don't need this patch since it's an API cleanup and
not fixing a bug. I'm replacing this code with the kernel_worker API
with this patch

https://lore.kernel.org/all/20211007214448.6282-9-michael.chris...@oracle.com/

in this patchset;

https://lore.kernel.org/all/20211007214448.6282-1-michael.chris...@oracle.com/

so the issue of using kthread_create + wake_up_process will be
gone shortly either way.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 0/4] Add blk_validate_block_size() helper for drivers to validate block size

2021-10-27 Thread Jens Axboe
On Tue, 26 Oct 2021 22:40:11 +0800, Xie Yongji wrote:
> The block layer can't support the block size larger than
> page size yet, so driver needs to validate the block size
> before setting it. Now this validation is done in device drivers
> with some duplicated codes. This series tries to add a block
> layer helper for it and makes loop driver, nbd driver and
> virtio-blk driver use it.
> 
> [...]

Applied, thanks!

[1/4] block: Add a helper to validate the block size
  commit: 570b1cac477643cbf01a45fa5d018430a1fddbce
[2/4] nbd: Use blk_validate_block_size() to validate block size
  commit: c4318d6cd038472d13e08a54a9035863c8c160bd
[3/4] loop: Use blk_validate_block_size() to validate block size
  commit: af3c570fb0df422b4906ebd11c1bf363d89961d5
[4/4] virtio-blk: Use blk_validate_block_size() to validate block size
  commit: 57a13a5b8157d9a8606490aaa1b805bafe6c37e1

Best regards,
-- 
Jens Axboe


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: vDPA bus driver selection

2021-10-27 Thread Michael S. Tsirkin
On Wed, Oct 27, 2021 at 03:21:15PM +, Parav Pandit wrote:
> Hi Stefano,
> 
> > From: Stefano Garzarella 
> > Sent: Wednesday, October 27, 2021 8:04 PM
> > 
> > Hi folks,
> > I was trying to understand if we have a way to specify which vDPA bus driver
> > (e.g. vhost-vdpa, virtio-vdpa) a device should use.
> > IIUC we don't have it, and the first registered driver is used when a new 
> > device
> > is registered.
> > 
> > I was thinking if it makes sense to extend the management API to specify 
> > which
> > bus driver to use for a device. A use case could be for example a single 
> > host
> > handling VMs and bare-metal containers, so we would have both virtio-vdpa
> > and vhost-vdpa loaded and we want to attach some devices to VMs through
> > vhost-vdpa and others to containers through virtio-vdpa.
> > 
> > What do you think?
> > 
> One option is, user keeps the drivers_autoprobe disabled for the vdpa bus 
> using,
> 
> $ vdpa/vdpa dev add mgmtdev vdpasim_net name vdpa0 mac 00:11:22:33:44:55
> $ echo 0 > /sys/bus/vdpa/drivers_autoprobe
> 
> And after vdpa device creation, it manually binds to the desired driver such 
> as,
> 
> $ echo vdpa0 > /sys/bus/vdpa/drivers/virtio_vdpa/bind
> Or
> $ echo vdpa0 > /sys/bus/vdpa/drivers/vhost_vdpa/bind
> 
> In an case of VDUSE, it makes more sense to bind to the one of the above 
> driver after user space has connected the use space backend to the kernel 
> device.

The only annoying thing is that manual bind is not validated.
E.g. if one makes a mistake and binds an incorrect device,
it just tends to crash IIRC.
Another is that it all needs to be root.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: futher decouple DAX from block devices

2021-10-27 Thread Dan Williams
[ add sfr ]

On Sun, Oct 17, 2021 at 9:41 PM Christoph Hellwig  wrote:
>
> Hi Dan,
>
> this series cleans up and simplifies the association between DAX and block
> devices in preparation of allowing to mount file systems directly on DAX
> devices without a detour through block devices.

So I notice that this is based on linux-next while libnvdimm-for-next
is based on v5.15-rc4. Since I'm not Andrew I went ahead and rebased
these onto v5.15-rc4, tested that, and then merged with linux-next to
resolve the conflicts and tested again.

My merge resolution is here [1]. Christoph, please have a look. The
rebase and the merge result are both passing my test and I'm now going
to review the individual patches. However, while I do that and collect
acks from DM and EROFS folks, I want to give Stephen a heads up that
this is coming. Primarily I want to see if someone sees a better
strategy to merge this, please let me know, but if not I plan to walk
Stephen and Linus through the resolution.

[1]: 
https://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm.git/commit/?id=c3894cf6eb8f


>
> Diffstat:
>  drivers/dax/Kconfig  |4
>  drivers/dax/bus.c|2
>  drivers/dax/super.c  |  220 
> +--
>  drivers/md/dm-linear.c   |   51 +++--
>  drivers/md/dm-log-writes.c   |   44 +++-
>  drivers/md/dm-stripe.c   |   65 +++-
>  drivers/md/dm-table.c|   22 ++--
>  drivers/md/dm-writecache.c   |2
>  drivers/md/dm.c  |   29 -
>  drivers/md/dm.h  |4
>  drivers/nvdimm/Kconfig   |2
>  drivers/nvdimm/pmem.c|9 -
>  drivers/s390/block/Kconfig   |2
>  drivers/s390/block/dcssblk.c |   12 +-
>  fs/dax.c |   13 ++
>  fs/erofs/super.c |   11 +-
>  fs/ext2/super.c  |6 -
>  fs/ext4/super.c  |9 +
>  fs/fuse/Kconfig  |2
>  fs/fuse/virtio_fs.c  |2
>  fs/xfs/xfs_super.c   |   54 +-
>  include/linux/dax.h  |   30 ++---
>  22 files changed, 185 insertions(+), 410 deletions(-)
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 01/11] dm: make the DAX support dependend on CONFIG_FS_DAX

2021-10-27 Thread Dan Williams
On Sun, Oct 17, 2021 at 9:41 PM Christoph Hellwig  wrote:
>
> The device mapper DAX support is all hanging off a block device and thus
> can't be used with device dax.  Make it depend on CONFIG_FS_DAX instead
> of CONFIG_DAX_DRIVER.  This also means that bdev_dax_pgoff only needs to
> be built under CONFIG_FS_DAX now.

Looks good.

Mike, can I get an ack to take this through nvdimm.git? (you'll likely
see me repeat this question on subsequent patches in this series).

>
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/dax/super.c| 6 ++
>  drivers/md/dm-linear.c | 2 +-
>  drivers/md/dm-log-writes.c | 2 +-
>  drivers/md/dm-stripe.c | 2 +-
>  drivers/md/dm-writecache.c | 2 +-
>  drivers/md/dm.c| 2 +-
>  6 files changed, 7 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index b882cf8106ea3..e20d0cef10a18 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -63,7 +63,7 @@ static int dax_host_hash(const char *host)
> return hashlen_hash(hashlen_string("DAX", host)) % DAX_HASH_SIZE;
>  }
>
> -#ifdef CONFIG_BLOCK
> +#if defined(CONFIG_BLOCK) && defined(CONFIG_FS_DAX)
>  #include 
>
>  int bdev_dax_pgoff(struct block_device *bdev, sector_t sector, size_t size,
> @@ -80,7 +80,6 @@ int bdev_dax_pgoff(struct block_device *bdev, sector_t 
> sector, size_t size,
>  }
>  EXPORT_SYMBOL(bdev_dax_pgoff);
>
> -#if IS_ENABLED(CONFIG_FS_DAX)
>  /**
>   * dax_get_by_host() - temporary lookup mechanism for filesystem-dax
>   * @host: alternate name for the device registered by a dax driver
> @@ -219,8 +218,7 @@ bool dax_supported(struct dax_device *dax_dev, struct 
> block_device *bdev,
> return ret;
>  }
>  EXPORT_SYMBOL_GPL(dax_supported);
> -#endif /* CONFIG_FS_DAX */
> -#endif /* CONFIG_BLOCK */
> +#endif /* CONFIG_BLOCK && CONFIG_FS_DAX */
>
>  enum dax_device_flags {
> /* !alive + rcu grace period == no new operations / mappings */
> diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
> index 679b4c0a2eea1..32fbab11bf90c 100644
> --- a/drivers/md/dm-linear.c
> +++ b/drivers/md/dm-linear.c
> @@ -163,7 +163,7 @@ static int linear_iterate_devices(struct dm_target *ti,
> return fn(ti, lc->dev, lc->start, ti->len, data);
>  }
>
> -#if IS_ENABLED(CONFIG_DAX_DRIVER)
> +#if IS_ENABLED(CONFIG_FS_DAX)
>  static long linear_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
> long nr_pages, void **kaddr, pfn_t *pfn)
>  {
> diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c
> index d93a4db235124..6d694526881d0 100644
> --- a/drivers/md/dm-log-writes.c
> +++ b/drivers/md/dm-log-writes.c
> @@ -903,7 +903,7 @@ static void log_writes_io_hints(struct dm_target *ti, 
> struct queue_limits *limit
> limits->io_min = limits->physical_block_size;
>  }
>
> -#if IS_ENABLED(CONFIG_DAX_DRIVER)
> +#if IS_ENABLED(CONFIG_FS_DAX)
>  static int log_dax(struct log_writes_c *lc, sector_t sector, size_t bytes,
>struct iov_iter *i)
>  {
> diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c
> index 6660b6b53d5bf..f084607220293 100644
> --- a/drivers/md/dm-stripe.c
> +++ b/drivers/md/dm-stripe.c
> @@ -300,7 +300,7 @@ static int stripe_map(struct dm_target *ti, struct bio 
> *bio)
> return DM_MAPIO_REMAPPED;
>  }
>
> -#if IS_ENABLED(CONFIG_DAX_DRIVER)
> +#if IS_ENABLED(CONFIG_FS_DAX)
>  static long stripe_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
> long nr_pages, void **kaddr, pfn_t *pfn)
>  {
> diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
> index 18320444fb0a9..4c3a6e33604d3 100644
> --- a/drivers/md/dm-writecache.c
> +++ b/drivers/md/dm-writecache.c
> @@ -38,7 +38,7 @@
>  #define BITMAP_GRANULARITY PAGE_SIZE
>  #endif
>
> -#if IS_ENABLED(CONFIG_ARCH_HAS_PMEM_API) && IS_ENABLED(CONFIG_DAX_DRIVER)
> +#if IS_ENABLED(CONFIG_ARCH_HAS_PMEM_API) && IS_ENABLED(CONFIG_FS_DAX)
>  #define DM_WRITECACHE_HAS_PMEM
>  #endif
>
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 7870e6460633f..79737aee516b1 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1783,7 +1783,7 @@ static struct mapped_device *alloc_dev(int minor)
> md->disk->private_data = md;
> sprintf(md->disk->disk_name, "dm-%d", minor);
>
> -   if (IS_ENABLED(CONFIG_DAX_DRIVER)) {
> +   if (IS_ENABLED(CONFIG_FS_DAX)) {
> md->dax_dev = alloc_dax(md, md->disk->disk_name,
> &dm_dax_ops, 0);
> if (IS_ERR(md->dax_dev))
> --
> 2.30.2
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 02/11] dax: remove CONFIG_DAX_DRIVER

2021-10-27 Thread Dan Williams
On Sun, Oct 17, 2021 at 9:41 PM Christoph Hellwig  wrote:
>
> CONFIG_DAX_DRIVER only selects CONFIG_DAX now, so remove it.

Looks good, I don't think an s390 ack is needed for this one.

> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/dax/Kconfig| 4 
>  drivers/nvdimm/Kconfig | 2 +-
>  drivers/s390/block/Kconfig | 2 +-
>  fs/fuse/Kconfig| 2 +-
>  4 files changed, 3 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig
> index d2834c2cfa10d..954ab14ba7778 100644
> --- a/drivers/dax/Kconfig
> +++ b/drivers/dax/Kconfig
> @@ -1,8 +1,4 @@
>  # SPDX-License-Identifier: GPL-2.0-only
> -config DAX_DRIVER
> -   select DAX
> -   bool
> -
>  menuconfig DAX
> tristate "DAX: direct access to differentiated memory"
> select SRCU
> diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
> index b7d1eb38b27d4..347fe7afa5830 100644
> --- a/drivers/nvdimm/Kconfig
> +++ b/drivers/nvdimm/Kconfig
> @@ -22,7 +22,7 @@ if LIBNVDIMM
>  config BLK_DEV_PMEM
> tristate "PMEM: Persistent memory block device support"
> default LIBNVDIMM
> -   select DAX_DRIVER
> +   select DAX
> select ND_BTT if BTT
> select ND_PFN if NVDIMM_PFN
> help
> diff --git a/drivers/s390/block/Kconfig b/drivers/s390/block/Kconfig
> index d0416dbd0cd81..e3710a762abae 100644
> --- a/drivers/s390/block/Kconfig
> +++ b/drivers/s390/block/Kconfig
> @@ -5,7 +5,7 @@ comment "S/390 block device drivers"
>  config DCSSBLK
> def_tristate m
> select FS_DAX_LIMITED
> -   select DAX_DRIVER
> +   select DAX
> prompt "DCSSBLK support"
> depends on S390 && BLOCK
> help
> diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig
> index 40ce9a1c12e5d..038ed0b9aaa5d 100644
> --- a/fs/fuse/Kconfig
> +++ b/fs/fuse/Kconfig
> @@ -45,7 +45,7 @@ config FUSE_DAX
> select INTERVAL_TREE
> depends on VIRTIO_FS
> depends on FS_DAX
> -   depends on DAX_DRIVER
> +   depends on DAX
> help
>   This allows bypassing guest page cache and allows mapping host page
>   cache directly in guest address space.
> --
> 2.30.2
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 03/11] dax: simplify the dax_device <-> gendisk association

2021-10-27 Thread Dan Williams
On Sun, Oct 17, 2021 at 9:41 PM Christoph Hellwig  wrote:
>
> Replace the dax_host_hash with an xarray indexed by the pointer value
> of the gendisk, and require explicitl calls from the block drivers that

s/explicitl/explicitl/

I've fixed that up locally.

> want to associate their gendisk with a dax_device.

This looks good. 0day-robot is now chewing on it via the test merge
with linux-next (libnvdimm-pending).
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 04/11] dax: remove the pgmap sanity checks in generic_fsdax_supported

2021-10-27 Thread Dan Williams
On Sun, Oct 17, 2021 at 9:41 PM Christoph Hellwig  wrote:
>
> Drivers that register a dax_dev should make sure it works, no need
> to double check from the file system.
>
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/dax/super.c | 49 +
>  1 file changed, 1 insertion(+), 48 deletions(-)
>
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index 9383c11b21853..04fc680542e8d 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -107,13 +107,9 @@ bool generic_fsdax_supported(struct dax_device *dax_dev,
> struct block_device *bdev, int blocksize, sector_t start,
> sector_t sectors)
>  {
> -   bool dax_enabled = false;
> pgoff_t pgoff, pgoff_end;
> -   void *kaddr, *end_kaddr;
> -   pfn_t pfn, end_pfn;
> sector_t last_page;
> -   long len, len2;
> -   int err, id;
> +   int err;
>
> if (blocksize != PAGE_SIZE) {
> pr_info("%pg: error: unsupported blocksize for dax\n", bdev);
> @@ -138,49 +134,6 @@ bool generic_fsdax_supported(struct dax_device *dax_dev,
> return false;
> }
>
> -   id = dax_read_lock();
> -   len = dax_direct_access(dax_dev, pgoff, 1, &kaddr, &pfn);
> -   len2 = dax_direct_access(dax_dev, pgoff_end, 1, &end_kaddr, &end_pfn);
> -
> -   if (len < 1 || len2 < 1) {
> -   pr_info("%pg: error: dax access failed (%ld)\n",
> -   bdev, len < 1 ? len : len2);
> -   dax_read_unlock(id);
> -   return false;
> -   }
> -
> -   if (IS_ENABLED(CONFIG_FS_DAX_LIMITED) && pfn_t_special(pfn)) {
> -   /*
> -* An arch that has enabled the pmem api should also
> -* have its drivers support pfn_t_devmap()
> -*
> -* This is a developer warning and should not trigger in
> -* production. dax_flush() will crash since it depends
> -* on being able to do (page_address(pfn_to_page())).
> -*/
> -   WARN_ON(IS_ENABLED(CONFIG_ARCH_HAS_PMEM_API));
> -   dax_enabled = true;
> -   } else if (pfn_t_devmap(pfn) && pfn_t_devmap(end_pfn)) {
> -   struct dev_pagemap *pgmap, *end_pgmap;
> -
> -   pgmap = get_dev_pagemap(pfn_t_to_pfn(pfn), NULL);
> -   end_pgmap = get_dev_pagemap(pfn_t_to_pfn(end_pfn), NULL);
> -   if (pgmap && pgmap == end_pgmap && pgmap->type == 
> MEMORY_DEVICE_FS_DAX
> -   && pfn_t_to_page(pfn)->pgmap == pgmap
> -   && pfn_t_to_page(end_pfn)->pgmap == pgmap
> -   && pfn_t_to_pfn(pfn) == PHYS_PFN(__pa(kaddr))
> -   && pfn_t_to_pfn(end_pfn) == 
> PHYS_PFN(__pa(end_kaddr)))

This is effectively a self-test for a regression that was found while
manipulating the 'struct page' memmap metadata reservation for PMEM
namespaces.

I guess it's just serving as a security-blanket now and I should find
a way to move it out to a self-test. I'll take a look.

> -   dax_enabled = true;
> -   put_dev_pagemap(pgmap);
> -   put_dev_pagemap(end_pgmap);
> -
> -   }
> -   dax_read_unlock(id);
> -
> -   if (!dax_enabled) {
> -   pr_info("%pg: error: dax support not enabled\n", bdev);
> -   return false;
> -   }
> return true;
>  }
>  EXPORT_SYMBOL_GPL(generic_fsdax_supported);
> --
> 2.30.2
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/3] virtio: cache indirect desc for split

2021-10-27 Thread kernel test robot
Hi Xuan,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on horms-ipvs/master]
[also build test WARNING on linus/master v5.15-rc7 next-20211027]
[cannot apply to mst-vhost/linux-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025
base:   https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs.git master
config: arm-defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/0day-ci/linux/commit/8935e116c155fb7d484bad35a42b2ca98f75e384
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025
git checkout 8935e116c155fb7d484bad35a42b2ca98f75e384
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross 
ARCH=arm 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

   drivers/virtio/virtio_ring.c: In function 'desc_cache_chain_free_split':
>> drivers/virtio/virtio_ring.c:438:25: warning: cast to pointer from integer 
>> of different size [-Wint-to-pointer-cast]
 438 | chain = (void *)desc->addr;
 | ^
   drivers/virtio/virtio_ring.c: In function 'desc_cache_put_split':
>> drivers/virtio/virtio_ring.c:447:30: warning: cast from pointer to integer 
>> of different size [-Wpointer-to-int-cast]
 447 | desc->addr = (u64)vq->desc_cache_chain;
 |  ^
   drivers/virtio/virtio_ring.c: In function 'alloc_indirect_split':
   drivers/virtio/virtio_ring.c:464:48: warning: cast to pointer from integer 
of different size [-Wint-to-pointer-cast]
 464 | vq->desc_cache_chain = (void *)desc->addr;
 |^


vim +438 drivers/virtio/virtio_ring.c

   431  
   432  static void desc_cache_chain_free_split(void *chain)
   433  {
   434  struct vring_desc *desc;
   435  
   436  while (chain) {
   437  desc = chain;
 > 438  chain = (void *)desc->addr;
   439  kfree(desc);
   440  }
   441  }
   442  
   443  static void desc_cache_put_split(struct vring_virtqueue *vq,
   444   struct vring_desc *desc, int n)
   445  {
   446  if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) {
 > 447  desc->addr = (u64)vq->desc_cache_chain;
   448  vq->desc_cache_chain = desc;
   449  } else {
   450  kfree(desc);
   451  }
   452  }
   453  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 05/11] dax: move the partition alignment check into fs_dax_get_by_bdev

2021-10-27 Thread Dan Williams
On Sun, Oct 17, 2021 at 9:41 PM Christoph Hellwig  wrote:
>
> fs_dax_get_by_bdev is the primary interface to find a dax device for a
> block device, so move the partition alignment check there instead of
> wiring it up through ->dax_supported.

Looks good.

>
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/dax/super.c | 23 ++-
>  1 file changed, 6 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index 04fc680542e8d..482fe775324a4 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -93,6 +93,12 @@ struct dax_device *fs_dax_get_by_bdev(struct block_device 
> *bdev)
> if (!blk_queue_dax(bdev->bd_disk->queue))
> return NULL;
>
> +   if ((get_start_sect(bdev) * SECTOR_SIZE) % PAGE_SIZE ||
> +   (bdev_nr_sectors(bdev) * SECTOR_SIZE) % PAGE_SIZE) {
> +   pr_info("%pg: error: unaligned partition for dax\n", bdev);
> +   return NULL;
> +   }
> +
> id = dax_read_lock();
> dax_dev = xa_load(&dax_hosts, (unsigned long)bdev->bd_disk);
> if (!dax_dev || !dax_alive(dax_dev) || !igrab(&dax_dev->inode))
> @@ -107,10 +113,6 @@ bool generic_fsdax_supported(struct dax_device *dax_dev,
> struct block_device *bdev, int blocksize, sector_t start,
> sector_t sectors)
>  {
> -   pgoff_t pgoff, pgoff_end;
> -   sector_t last_page;
> -   int err;
> -
> if (blocksize != PAGE_SIZE) {
> pr_info("%pg: error: unsupported blocksize for dax\n", bdev);
> return false;
> @@ -121,19 +123,6 @@ bool generic_fsdax_supported(struct dax_device *dax_dev,
> return false;
> }
>
> -   err = bdev_dax_pgoff(bdev, start, PAGE_SIZE, &pgoff);
> -   if (err) {
> -   pr_info("%pg: error: unaligned partition for dax\n", bdev);
> -   return false;
> -   }
> -
> -   last_page = PFN_DOWN((start + sectors - 1) * 512) * PAGE_SIZE / 512;
> -   err = bdev_dax_pgoff(bdev, last_page, PAGE_SIZE, &pgoff_end);
> -   if (err) {
> -   pr_info("%pg: error: unaligned partition for dax\n", bdev);
> -   return false;
> -   }
> -
> return true;
>  }
>  EXPORT_SYMBOL_GPL(generic_fsdax_supported);
> --
> 2.30.2
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 06/11] xfs: factor out a xfs_setup_dax helper

2021-10-27 Thread Dan Williams
On Tue, Oct 19, 2021 at 12:24 AM Christoph Hellwig  wrote:
>
> On Mon, Oct 18, 2021 at 09:43:51AM -0700, Darrick J. Wong wrote:
> > > --- a/fs/xfs/xfs_super.c
> > > +++ b/fs/xfs/xfs_super.c
> > > @@ -339,6 +339,32 @@ xfs_buftarg_is_dax(
> > > bdev_nr_sectors(bt->bt_bdev));
> > >  }
> > >
> > > +static int
> > > +xfs_setup_dax(
> >
> > /me wonders if this should be named xfs_setup_dax_always, since this
> > doesn't handle the dax=inode mode?
>
> Sure, why not.

I went ahead and made that change locally.

>
> > The only reason I bring that up is that Eric reminded me a while ago
> > that we don't actually print any kind of EXPERIMENTAL warning for the
> > auto-detection behavior.
>
> Yes, I actually noticed that as well when preparing this series.

The rest looks good to me.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 07/11] dax: remove dax_capable

2021-10-27 Thread Dan Williams
I am going to change the subject of this patch to:

dax: remove ->dax_supported()

On Sun, Oct 17, 2021 at 9:41 PM Christoph Hellwig  wrote:
>

I'll add a bit more background to help others review this.

The ->dax_supported() operation arranges for a stack of devices to
each answer the question "is dax operational". That request routes to
generic_fsdax_supported() at last level device and that attempted an
actual dax_direct_access() call and did some sanity checks. However,
those sanity checks can be validated in other ways and with those
removed the only question to answer is "has each block device driver
in the stack performed dax_add_host()". That can be validated without
a dax_operation. So, just open code the block size and dax_dev == NULL
checks in the callers, and delete ->dax_supported().

Mike, let me know if you have any concerns.

> Just open code the block size and dax_dev == NULL checks in the callers.
>
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/dax/super.c  | 36 
>  drivers/md/dm-table.c| 22 +++---
>  drivers/md/dm.c  | 21 -
>  drivers/md/dm.h  |  4 
>  drivers/nvdimm/pmem.c|  1 -
>  drivers/s390/block/dcssblk.c |  1 -
>  fs/erofs/super.c | 11 +++
>  fs/ext2/super.c  |  6 --
>  fs/ext4/super.c  |  9 ++---
>  fs/xfs/xfs_super.c   | 21 -
>  include/linux/dax.h  | 14 --
>  11 files changed, 36 insertions(+), 110 deletions(-)
>
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index 482fe775324a4..803942586d1b6 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -108,42 +108,6 @@ struct dax_device *fs_dax_get_by_bdev(struct 
> block_device *bdev)
> return dax_dev;
>  }
>  EXPORT_SYMBOL_GPL(fs_dax_get_by_bdev);
> -
> -bool generic_fsdax_supported(struct dax_device *dax_dev,
> -   struct block_device *bdev, int blocksize, sector_t start,
> -   sector_t sectors)
> -{
> -   if (blocksize != PAGE_SIZE) {
> -   pr_info("%pg: error: unsupported blocksize for dax\n", bdev);
> -   return false;
> -   }
> -
> -   if (!dax_dev) {
> -   pr_debug("%pg: error: dax unsupported by block device\n", 
> bdev);
> -   return false;
> -   }
> -
> -   return true;
> -}
> -EXPORT_SYMBOL_GPL(generic_fsdax_supported);
> -
> -bool dax_supported(struct dax_device *dax_dev, struct block_device *bdev,
> -   int blocksize, sector_t start, sector_t len)
> -{
> -   bool ret = false;
> -   int id;
> -
> -   if (!dax_dev)
> -   return false;
> -
> -   id = dax_read_lock();
> -   if (dax_alive(dax_dev) && dax_dev->ops->dax_supported)
> -   ret = dax_dev->ops->dax_supported(dax_dev, bdev, blocksize,
> - start, len);
> -   dax_read_unlock(id);
> -   return ret;
> -}
> -EXPORT_SYMBOL_GPL(dax_supported);
>  #endif /* CONFIG_BLOCK && CONFIG_FS_DAX */
>
>  enum dax_device_flags {
> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index 1fa4d5582dca5..4ae671c2168ea 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c
> @@ -807,12 +807,14 @@ void dm_table_set_type(struct dm_table *t, enum 
> dm_queue_mode type)
>  EXPORT_SYMBOL_GPL(dm_table_set_type);
>
>  /* validate the dax capability of the target device span */
> -int device_not_dax_capable(struct dm_target *ti, struct dm_dev *dev,
> +static int device_not_dax_capable(struct dm_target *ti, struct dm_dev *dev,
> sector_t start, sector_t len, void *data)
>  {
> -   int blocksize = *(int *) data;
> +   if (dev->dax_dev)
> +   return false;
>
> -   return !dax_supported(dev->dax_dev, dev->bdev, blocksize, start, len);
> +   pr_debug("%pg: error: dax unsupported by block device\n", dev->bdev);
> +   return true;
>  }
>
>  /* Check devices support synchronous DAX */
> @@ -822,8 +824,8 @@ static int device_not_dax_synchronous_capable(struct 
> dm_target *ti, struct dm_de
> return !dev->dax_dev || !dax_synchronous(dev->dax_dev);
>  }
>
> -bool dm_table_supports_dax(struct dm_table *t,
> -  iterate_devices_callout_fn iterate_fn, int 
> *blocksize)
> +static bool dm_table_supports_dax(struct dm_table *t,
> +  iterate_devices_callout_fn iterate_fn)
>  {
> struct dm_target *ti;
> unsigned i;
> @@ -836,7 +838,7 @@ bool dm_table_supports_dax(struct dm_table *t,
> return false;
>
> if (!ti->type->iterate_devices ||
> -   ti->type->iterate_devices(ti, iterate_fn, blocksize))
> +   ti->type->iterate_devices(ti, iterate_fn, NULL))
> return false;
> }
>
> @@ -863,7 +865,6 @@ static int dm

Re: [dm-devel] [PATCH 07/11] dax: remove dax_capable

2021-10-27 Thread Dan Williams
On Tue, Oct 19, 2021 at 8:45 AM Darrick J. Wong  wrote:
>
> On Mon, Oct 18, 2021 at 06:40:50AM +0200, Christoph Hellwig wrote:
> > Just open code the block size and dax_dev == NULL checks in the callers.
> >
> > Signed-off-by: Christoph Hellwig 
> > ---
> >  drivers/dax/super.c  | 36 
> >  drivers/md/dm-table.c| 22 +++---
> >  drivers/md/dm.c  | 21 -
> >  drivers/md/dm.h  |  4 
> >  drivers/nvdimm/pmem.c|  1 -
> >  drivers/s390/block/dcssblk.c |  1 -
> >  fs/erofs/super.c | 11 +++
> >  fs/ext2/super.c  |  6 --
> >  fs/ext4/super.c  |  9 ++---
> >  fs/xfs/xfs_super.c   | 21 -
> >  include/linux/dax.h  | 14 --
> >  11 files changed, 36 insertions(+), 110 deletions(-)
> >
[..]   if (ext4_has_feature_inline_data(sb)) {
> > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > index d07020a8eb9e3..163ceafbd8fd2 100644
> > --- a/fs/xfs/xfs_super.c
> > +++ b/fs/xfs/xfs_super.c
[..]
> > + if (mp->m_super->s_blocksize != PAGE_SIZE) {
> > + xfs_alert(mp,
> > + "DAX not supported for blocksize. Turning off 
> > DAX.\n");
>
> Newlines aren't needed at the end of extX_msg/xfs_alert format strings.

Thanks Darrick, I fixed those up.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/3] virtio: cache indirect desc for packed

2021-10-27 Thread kernel test robot
Hi Xuan,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on horms-ipvs/master]
[also build test WARNING on linus/master v5.15-rc7]
[cannot apply to mst-vhost/linux-next next-20211027]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025
base:   https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs.git master
config: hexagon-randconfig-r045-20211027 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 
5db7568a6a1fcb408eb8988abdaff2a225a8eb72)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/0day-ci/linux/commit/bb65ceda850ed4592d8a940e01926d5e3d33ae92
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025
git checkout bb65ceda850ed4592d8a940e01926d5e3d33ae92
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 
ARCH=hexagon 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

>> drivers/virtio/virtio_ring.c:1467:7: warning: variable 'len' is 
>> uninitialized when used here [-Wuninitialized]
   n = len / sizeof(struct vring_packed_desc);
   ^~~
   drivers/virtio/virtio_ring.c:1460:10: note: initialize the variable 'len' to 
silence this warning
   u32 len, n;
  ^
   = 0
   1 warning generated.


vim +/len +1467 drivers/virtio/virtio_ring.c

  1433  
  1434  static void detach_buf_packed(struct vring_virtqueue *vq,
  1435unsigned int id, void **ctx)
  1436  {
  1437  struct vring_desc_state_packed *state = NULL;
  1438  struct vring_packed_desc *desc;
  1439  unsigned int i, curr;
  1440  
  1441  state = &vq->packed.desc_state[id];
  1442  
  1443  /* Clear data ptr. */
  1444  state->data = NULL;
  1445  
  1446  vq->packed.desc_extra[state->last].next = vq->free_head;
  1447  vq->free_head = id;
  1448  vq->vq.num_free += state->num;
  1449  
  1450  if (unlikely(vq->use_dma_api)) {
  1451  curr = id;
  1452  for (i = 0; i < state->num; i++) {
  1453  vring_unmap_state_packed(vq,
  1454  &vq->packed.desc_extra[curr]);
  1455  curr = vq->packed.desc_extra[curr].next;
  1456  }
  1457  }
  1458  
  1459  if (vq->indirect) {
  1460  u32 len, n;
  1461  
  1462  /* Free the indirect table, if any, now that it's 
unmapped. */
  1463  desc = state->indir_desc;
  1464  if (!desc)
  1465  return;
  1466  
> 1467  n = len / sizeof(struct vring_packed_desc);
  1468  
  1469  if (vq->use_dma_api) {
  1470  len = vq->packed.desc_extra[id].len;
  1471  for (i = 0; i < n; i++)
  1472  vring_unmap_desc_packed(vq, &desc[i]);
  1473  }
  1474  
  1475  desc_cache_put_packed(vq, desc, n);
  1476  state->indir_desc = NULL;
  1477  } else if (ctx) {
  1478  *ctx = state->indir_desc;
  1479  }
  1480  }
  1481  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/3] virtio: cache indirect desc for split

2021-10-27 Thread kernel test robot
Hi Xuan,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on horms-ipvs/master]
[also build test WARNING on linus/master v5.15-rc7 next-20211027]
[cannot apply to mst-vhost/linux-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025
base:   https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs.git master
config: i386-randconfig-s002-20211027 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce:
# apt-get install sparse
# sparse version: v0.6.4-dirty
# 
https://github.com/0day-ci/linux/commit/8935e116c155fb7d484bad35a42b2ca98f75e384
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025
git checkout 8935e116c155fb7d484bad35a42b2ca98f75e384
# save the attached .config to linux build tree
make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir 
ARCH=i386 SHELL=/bin/bash drivers/virtio/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 


sparse warnings: (new ones prefixed by >>)
>> drivers/virtio/virtio_ring.c:438:26: sparse: sparse: cast from restricted 
>> __virtio64
>> drivers/virtio/virtio_ring.c:447:28: sparse: sparse: incorrect type in 
>> assignment (different base types) @@ expected restricted __virtio64 
>> [usertype] addr @@ got unsigned long long [usertype] @@
   drivers/virtio/virtio_ring.c:447:28: sparse: expected restricted 
__virtio64 [usertype] addr
   drivers/virtio/virtio_ring.c:447:28: sparse: got unsigned long long 
[usertype]
   drivers/virtio/virtio_ring.c:464:49: sparse: sparse: cast from restricted 
__virtio64

vim +438 drivers/virtio/virtio_ring.c

   431  
   432  static void desc_cache_chain_free_split(void *chain)
   433  {
   434  struct vring_desc *desc;
   435  
   436  while (chain) {
   437  desc = chain;
 > 438  chain = (void *)desc->addr;
   439  kfree(desc);
   440  }
   441  }
   442  
   443  static void desc_cache_put_split(struct vring_virtqueue *vq,
   444   struct vring_desc *desc, int n)
   445  {
   446  if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) {
 > 447  desc->addr = (u64)vq->desc_cache_chain;
   448  vq->desc_cache_chain = desc;
   449  } else {
   450  kfree(desc);
   451  }
   452  }
   453  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 08/11] dm-linear: add a linear_dax_pgoff helper

2021-10-27 Thread Dan Williams
On Sun, Oct 17, 2021 at 9:41 PM Christoph Hellwig  wrote:
>
> Add a helper to perform the entire remapping for DAX accesses.  This
> helper open codes bdev_dax_pgoff given that the alignment checks have
> already been done by the submitting file system and don't need to be
> repeated.

Looks good.

Mike, ack?

>
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/md/dm-linear.c | 49 +-
>  1 file changed, 15 insertions(+), 34 deletions(-)
>
> diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
> index 32fbab11bf90c..bf03f73fd0f36 100644
> --- a/drivers/md/dm-linear.c
> +++ b/drivers/md/dm-linear.c
> @@ -164,63 +164,44 @@ static int linear_iterate_devices(struct dm_target *ti,
>  }
>
>  #if IS_ENABLED(CONFIG_FS_DAX)
> +static struct dax_device *linear_dax_pgoff(struct dm_target *ti, pgoff_t 
> *pgoff)
> +{
> +   struct linear_c *lc = ti->private;
> +   sector_t sector = linear_map_sector(ti, *pgoff << PAGE_SECTORS_SHIFT);
> +
> +   *pgoff = (get_start_sect(lc->dev->bdev) + sector) >> 
> PAGE_SECTORS_SHIFT;
> +   return lc->dev->dax_dev;
> +}
> +
>  static long linear_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
> long nr_pages, void **kaddr, pfn_t *pfn)
>  {
> -   long ret;
> -   struct linear_c *lc = ti->private;
> -   struct block_device *bdev = lc->dev->bdev;
> -   struct dax_device *dax_dev = lc->dev->dax_dev;
> -   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
> -
> -   dev_sector = linear_map_sector(ti, sector);
> -   ret = bdev_dax_pgoff(bdev, dev_sector, nr_pages * PAGE_SIZE, &pgoff);
> -   if (ret)
> -   return ret;
> +   struct dax_device *dax_dev = linear_dax_pgoff(ti, &pgoff);
> +
> return dax_direct_access(dax_dev, pgoff, nr_pages, kaddr, pfn);
>  }
>
>  static size_t linear_dax_copy_from_iter(struct dm_target *ti, pgoff_t pgoff,
> void *addr, size_t bytes, struct iov_iter *i)
>  {
> -   struct linear_c *lc = ti->private;
> -   struct block_device *bdev = lc->dev->bdev;
> -   struct dax_device *dax_dev = lc->dev->dax_dev;
> -   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
> +   struct dax_device *dax_dev = linear_dax_pgoff(ti, &pgoff);
>
> -   dev_sector = linear_map_sector(ti, sector);
> -   if (bdev_dax_pgoff(bdev, dev_sector, ALIGN(bytes, PAGE_SIZE), &pgoff))
> -   return 0;
> return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i);
>  }
>
>  static size_t linear_dax_copy_to_iter(struct dm_target *ti, pgoff_t pgoff,
> void *addr, size_t bytes, struct iov_iter *i)
>  {
> -   struct linear_c *lc = ti->private;
> -   struct block_device *bdev = lc->dev->bdev;
> -   struct dax_device *dax_dev = lc->dev->dax_dev;
> -   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
> +   struct dax_device *dax_dev = linear_dax_pgoff(ti, &pgoff);
>
> -   dev_sector = linear_map_sector(ti, sector);
> -   if (bdev_dax_pgoff(bdev, dev_sector, ALIGN(bytes, PAGE_SIZE), &pgoff))
> -   return 0;
> return dax_copy_to_iter(dax_dev, pgoff, addr, bytes, i);
>  }
>
>  static int linear_dax_zero_page_range(struct dm_target *ti, pgoff_t pgoff,
>   size_t nr_pages)
>  {
> -   int ret;
> -   struct linear_c *lc = ti->private;
> -   struct block_device *bdev = lc->dev->bdev;
> -   struct dax_device *dax_dev = lc->dev->dax_dev;
> -   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
> -
> -   dev_sector = linear_map_sector(ti, sector);
> -   ret = bdev_dax_pgoff(bdev, dev_sector, nr_pages << PAGE_SHIFT, 
> &pgoff);
> -   if (ret)
> -   return ret;
> +   struct dax_device *dax_dev = linear_dax_pgoff(ti, &pgoff);
> +
> return dax_zero_page_range(dax_dev, pgoff, nr_pages);
>  }
>
> --
> 2.30.2
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 09/11] dm-log-writes: add a log_writes_dax_pgoff helper

2021-10-27 Thread Dan Williams
On Sun, Oct 17, 2021 at 9:41 PM Christoph Hellwig  wrote:
>
> Add a helper to perform the entire remapping for DAX accesses.  This
> helper open codes bdev_dax_pgoff given that the alignment checks have
> already been done by the submitting file system and don't need to be
> repeated.

Looks good.

Mike, ack?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 10/11] dm-stripe: add a stripe_dax_pgoff helper

2021-10-27 Thread Dan Williams
On Sun, Oct 17, 2021 at 9:41 PM Christoph Hellwig  wrote:
>
> Add a helper to perform the entire remapping for DAX accesses.  This
> helper open codes bdev_dax_pgoff given that the alignment checks have
> already been done by the submitting file system and don't need to be
> repeated.

Again, looks good. Kind of embarrassing when the open-coded version is
less LOC than using the helper.

Mike, ack?

>
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/md/dm-stripe.c | 63 ++
>  1 file changed, 15 insertions(+), 48 deletions(-)
>
> diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c
> index f084607220293..50dba3f39274c 100644
> --- a/drivers/md/dm-stripe.c
> +++ b/drivers/md/dm-stripe.c
> @@ -301,83 +301,50 @@ static int stripe_map(struct dm_target *ti, struct bio 
> *bio)
>  }
>
>  #if IS_ENABLED(CONFIG_FS_DAX)
> -static long stripe_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
> -   long nr_pages, void **kaddr, pfn_t *pfn)
> +static struct dax_device *stripe_dax_pgoff(struct dm_target *ti, pgoff_t 
> *pgoff)
>  {
> -   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
> struct stripe_c *sc = ti->private;
> -   struct dax_device *dax_dev;
> struct block_device *bdev;
> +   sector_t dev_sector;
> uint32_t stripe;
> -   long ret;
>
> -   stripe_map_sector(sc, sector, &stripe, &dev_sector);
> +   stripe_map_sector(sc, *pgoff * PAGE_SECTORS, &stripe, &dev_sector);
> dev_sector += sc->stripe[stripe].physical_start;
> -   dax_dev = sc->stripe[stripe].dev->dax_dev;
> bdev = sc->stripe[stripe].dev->bdev;
>
> -   ret = bdev_dax_pgoff(bdev, dev_sector, nr_pages * PAGE_SIZE, &pgoff);
> -   if (ret)
> -   return ret;
> +   *pgoff = (get_start_sect(bdev) + dev_sector) >> PAGE_SECTORS_SHIFT;
> +   return sc->stripe[stripe].dev->dax_dev;
> +}
> +
> +static long stripe_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
> +   long nr_pages, void **kaddr, pfn_t *pfn)
> +{
> +   struct dax_device *dax_dev = stripe_dax_pgoff(ti, &pgoff);
> +
> return dax_direct_access(dax_dev, pgoff, nr_pages, kaddr, pfn);
>  }
>
>  static size_t stripe_dax_copy_from_iter(struct dm_target *ti, pgoff_t pgoff,
> void *addr, size_t bytes, struct iov_iter *i)
>  {
> -   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
> -   struct stripe_c *sc = ti->private;
> -   struct dax_device *dax_dev;
> -   struct block_device *bdev;
> -   uint32_t stripe;
> -
> -   stripe_map_sector(sc, sector, &stripe, &dev_sector);
> -   dev_sector += sc->stripe[stripe].physical_start;
> -   dax_dev = sc->stripe[stripe].dev->dax_dev;
> -   bdev = sc->stripe[stripe].dev->bdev;
> +   struct dax_device *dax_dev = stripe_dax_pgoff(ti, &pgoff);
>
> -   if (bdev_dax_pgoff(bdev, dev_sector, ALIGN(bytes, PAGE_SIZE), &pgoff))
> -   return 0;
> return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i);
>  }
>
>  static size_t stripe_dax_copy_to_iter(struct dm_target *ti, pgoff_t pgoff,
> void *addr, size_t bytes, struct iov_iter *i)
>  {
> -   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
> -   struct stripe_c *sc = ti->private;
> -   struct dax_device *dax_dev;
> -   struct block_device *bdev;
> -   uint32_t stripe;
> -
> -   stripe_map_sector(sc, sector, &stripe, &dev_sector);
> -   dev_sector += sc->stripe[stripe].physical_start;
> -   dax_dev = sc->stripe[stripe].dev->dax_dev;
> -   bdev = sc->stripe[stripe].dev->bdev;
> +   struct dax_device *dax_dev = stripe_dax_pgoff(ti, &pgoff);
>
> -   if (bdev_dax_pgoff(bdev, dev_sector, ALIGN(bytes, PAGE_SIZE), &pgoff))
> -   return 0;
> return dax_copy_to_iter(dax_dev, pgoff, addr, bytes, i);
>  }
>
>  static int stripe_dax_zero_page_range(struct dm_target *ti, pgoff_t pgoff,
>   size_t nr_pages)
>  {
> -   int ret;
> -   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
> -   struct stripe_c *sc = ti->private;
> -   struct dax_device *dax_dev;
> -   struct block_device *bdev;
> -   uint32_t stripe;
> +   struct dax_device *dax_dev = stripe_dax_pgoff(ti, &pgoff);
>
> -   stripe_map_sector(sc, sector, &stripe, &dev_sector);
> -   dev_sector += sc->stripe[stripe].physical_start;
> -   dax_dev = sc->stripe[stripe].dev->dax_dev;
> -   bdev = sc->stripe[stripe].dev->bdev;
> -
> -   ret = bdev_dax_pgoff(bdev, dev_sector, nr_pages << PAGE_SHIFT, 
> &pgoff);
> -   if (ret)
> -   return ret;
> return dax_zero_page_range(dax_dev, pgoff, nr_pages);
>  }
>
> --
> 2.30.2
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 11/11] dax: move bdev_dax_pgoff to fs/dax.c

2021-10-27 Thread Dan Williams
On Sun, Oct 17, 2021 at 9:41 PM Christoph Hellwig  wrote:
>
> No functional changet, but this will allow for a tighter integration

s/changet/changes/

> with the iomap code, including possible passing the partition offset

s/possible/possibly/

> in the iomap in the future.  For now it mostly avoids growing more

s/now/now,/

...all of the above fixed up locally.

Other than that, it looks good to me.

> callers outside of fs/dax.c.
>
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/dax/super.c | 14 --
>  fs/dax.c| 13 +
>  include/linux/dax.h |  1 -
>  3 files changed, 13 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index 803942586d1b6..c0910687fbcb2 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -67,20 +67,6 @@ void dax_remove_host(struct gendisk *disk)
>  }
>  EXPORT_SYMBOL_GPL(dax_remove_host);
>
> -int bdev_dax_pgoff(struct block_device *bdev, sector_t sector, size_t size,
> -   pgoff_t *pgoff)
> -{
> -   sector_t start_sect = bdev ? get_start_sect(bdev) : 0;
> -   phys_addr_t phys_off = (start_sect + sector) * 512;
> -
> -   if (pgoff)
> -   *pgoff = PHYS_PFN(phys_off);
> -   if (phys_off % PAGE_SIZE || size % PAGE_SIZE)
> -   return -EINVAL;
> -   return 0;
> -}
> -EXPORT_SYMBOL(bdev_dax_pgoff);
> -
>  /**
>   * dax_get_by_host() - temporary lookup mechanism for filesystem-dax
>   * @bdev: block device to find a dax_device for
> diff --git a/fs/dax.c b/fs/dax.c
> index 4e3e5a283a916..eb715363fd667 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -709,6 +709,19 @@ int dax_invalidate_mapping_entry_sync(struct 
> address_space *mapping,
> return __dax_invalidate_entry(mapping, index, false);
>  }
>
> +static int bdev_dax_pgoff(struct block_device *bdev, sector_t sector, size_t 
> size,
> +   pgoff_t *pgoff)
> +{
> +   sector_t start_sect = bdev ? get_start_sect(bdev) : 0;
> +   phys_addr_t phys_off = (start_sect + sector) * 512;
> +
> +   if (pgoff)
> +   *pgoff = PHYS_PFN(phys_off);
> +   if (phys_off % PAGE_SIZE || size % PAGE_SIZE)
> +   return -EINVAL;
> +   return 0;
> +}
> +
>  static int copy_cow_page_dax(struct block_device *bdev, struct dax_device 
> *dax_dev,
>  sector_t sector, struct page *to, unsigned long 
> vaddr)
>  {
> diff --git a/include/linux/dax.h b/include/linux/dax.h
> index 439c3c70e347b..324363b798ecd 100644
> --- a/include/linux/dax.h
> +++ b/include/linux/dax.h
> @@ -107,7 +107,6 @@ static inline bool daxdev_mapping_supported(struct 
> vm_area_struct *vma,
>  #endif
>
>  struct writeback_control;
> -int bdev_dax_pgoff(struct block_device *, sector_t, size_t, pgoff_t *pgoff);
>  #if IS_ENABLED(CONFIG_FS_DAX)
>  int dax_add_host(struct dax_device *dax_dev, struct gendisk *disk);
>  void dax_remove_host(struct gendisk *disk);
> --
> 2.30.2
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 3/3] virtio-net: enable virtio indirect cache

2021-10-27 Thread Xuan Zhuo
On Wed, 27 Oct 2021 08:55:28 -0700, Jakub Kicinski  wrote:
> On Wed, 27 Oct 2021 14:19:13 +0800 Xuan Zhuo wrote:
> > +static bool virtio_desc_cache = true;
> >  module_param(csum, bool, 0444);
> >  module_param(gso, bool, 0444);
> >  module_param(napi_tx, bool, 0644);
> > +module_param(virtio_desc_cache, bool, 0644);
>
> Can this be an ethtool priv flag? module params are discouraged because
> they can't be controlled per-netdev.


The current design can only be set when the device is initialized. So using
ethtool to modify it will not work.

Thanks.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/3] virtio: cache indirect desc for split

2021-10-27 Thread Jason Wang
On Thu, Oct 28, 2021 at 1:07 AM Michael S. Tsirkin  wrote:
>
> On Wed, Oct 27, 2021 at 02:19:11PM +0800, Xuan Zhuo wrote:
> > In the case of using indirect, indirect desc must be allocated and
> > released each time, which increases a lot of cpu overhead.
> >
> > Here, a cache is added for indirect. If the number of indirect desc to be
> > applied for is less than VIRT_QUEUE_CACHE_DESC_NUM, the desc array with
> > the size of VIRT_QUEUE_CACHE_DESC_NUM is fixed and cached for reuse.
> >
> > Signed-off-by: Xuan Zhuo 
> > ---
> >  drivers/virtio/virtio.c  |  6 
> >  drivers/virtio/virtio_ring.c | 63 ++--
> >  include/linux/virtio.h   | 10 ++
> >  3 files changed, 70 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > index 0a5b54034d4b..04bcb74e5b9a 100644
> > --- a/drivers/virtio/virtio.c
> > +++ b/drivers/virtio/virtio.c
> > @@ -431,6 +431,12 @@ bool is_virtio_device(struct device *dev)
> >  }
> >  EXPORT_SYMBOL_GPL(is_virtio_device);
> >
> > +void virtio_use_desc_cache(struct virtio_device *dev, bool val)
> > +{
> > + dev->desc_cache = val;
> > +}
> > +EXPORT_SYMBOL_GPL(virtio_use_desc_cache);
> > +
> >  void unregister_virtio_device(struct virtio_device *dev)
> >  {
> >   int index = dev->index; /* save for after device release */
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index dd95dfd85e98..0b9a8544b0e8 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -117,6 +117,10 @@ struct vring_virtqueue {
> >   /* Hint for event idx: already triggered no need to disable. */
> >   bool event_triggered;
> >
> > + /* Is indirect cache used? */
> > + bool use_desc_cache;
> > + void *desc_cache_chain;
> > +
> >   union {
> >   /* Available for split ring */
> >   struct {
> > @@ -423,12 +427,47 @@ static unsigned int vring_unmap_one_split(const 
> > struct vring_virtqueue *vq,
> >   return extra[i].next;
> >  }
> >
> > -static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
> > +#define VIRT_QUEUE_CACHE_DESC_NUM 4
> > +
> > +static void desc_cache_chain_free_split(void *chain)
> > +{
> > + struct vring_desc *desc;
> > +
> > + while (chain) {
> > + desc = chain;
> > + chain = (void *)desc->addr;
> > + kfree(desc);
> > + }
> > +}
> > +
> > +static void desc_cache_put_split(struct vring_virtqueue *vq,
> > +  struct vring_desc *desc, int n)
> > +{
> > + if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) {
> > + desc->addr = (u64)vq->desc_cache_chain;
> > + vq->desc_cache_chain = desc;
> > + } else {
> > + kfree(desc);
> > + }
> > +}
> > +
>
>
> So I have a question here. What happens if we just do:
>
> if (n <= VIRT_QUEUE_CACHE_DESC_NUM) {
> return kmem_cache_alloc(VIRT_QUEUE_CACHE_DESC_NUM * sizeof desc, gfp)
> } else {
> return kmalloc_arrat(n, sizeof desc, gfp)
> }
>
> A small change and won't we reap most performance benefits?

Yes, I think we need a benchmark to use private cache to see how much
it can help.

Thanks

>
> --
> MST
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: vDPA bus driver selection

2021-10-27 Thread Jason Wang
On Thu, Oct 28, 2021 at 4:16 AM Michael S. Tsirkin  wrote:
>
> On Wed, Oct 27, 2021 at 03:21:15PM +, Parav Pandit wrote:
> > Hi Stefano,
> >
> > > From: Stefano Garzarella 
> > > Sent: Wednesday, October 27, 2021 8:04 PM
> > >
> > > Hi folks,
> > > I was trying to understand if we have a way to specify which vDPA bus 
> > > driver
> > > (e.g. vhost-vdpa, virtio-vdpa) a device should use.
> > > IIUC we don't have it, and the first registered driver is used when a new 
> > > device
> > > is registered.
> > >
> > > I was thinking if it makes sense to extend the management API to specify 
> > > which
> > > bus driver to use for a device.

Actually, we want to support this in the first version of vDPA bus.
But for some reason it was dropped. The idea is to specify the device
type 'virtio' or 'vhost'. But a concern is that, it may encourage
vendor to implement e.g virtio specific device (without DMA
isolation).

>A use case could be for example a single host
> > > handling VMs and bare-metal containers, so we would have both virtio-vdpa
> > > and vhost-vdpa loaded and we want to attach some devices to VMs through
> > > vhost-vdpa and others to containers through virtio-vdpa.
> > >
> > > What do you think?
> > >
> > One option is, user keeps the drivers_autoprobe disabled for the vdpa bus 
> > using,
> >
> > $ vdpa/vdpa dev add mgmtdev vdpasim_net name vdpa0 mac 00:11:22:33:44:55
> > $ echo 0 > /sys/bus/vdpa/drivers_autoprobe
> >
> > And after vdpa device creation, it manually binds to the desired driver 
> > such as,
> >
> > $ echo vdpa0 > /sys/bus/vdpa/drivers/virtio_vdpa/bind
> > Or
> > $ echo vdpa0 > /sys/bus/vdpa/drivers/vhost_vdpa/bind
> >
> > In an case of VDUSE, it makes more sense to bind to the one of the above 
> > driver after user space has connected the use space backend to the kernel 
> > device.
>
> The only annoying thing is that manual bind is not validated.
> E.g. if one makes a mistake and binds an incorrect device,
> it just tends to crash IIRC.
> Another is that it all needs to be root.

I'm not sure it's worth bothering with. As discussed, switching
between vendor drivers and vfio requires manually bind/unbind as well.

Thanks

>
> --
> MST
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 3/3] virtio-net: enable virtio indirect cache

2021-10-27 Thread Jason Wang
On Thu, Oct 28, 2021 at 9:59 AM Xuan Zhuo  wrote:
>
> On Wed, 27 Oct 2021 08:55:28 -0700, Jakub Kicinski  wrote:
> > On Wed, 27 Oct 2021 14:19:13 +0800 Xuan Zhuo wrote:
> > > +static bool virtio_desc_cache = true;
> > >  module_param(csum, bool, 0444);
> > >  module_param(gso, bool, 0444);
> > >  module_param(napi_tx, bool, 0644);
> > > +module_param(virtio_desc_cache, bool, 0644);
> >
> > Can this be an ethtool priv flag? module params are discouraged because
> > they can't be controlled per-netdev.
>
>
> The current design can only be set when the device is initialized. So using
> ethtool to modify it will not work.

Anyhow you can add things like synchronization to make it work. But I
think what we want is to make it work unconditionally, so having a
module parameter seems useless. If you want to use it for
benchmarking?

Thanks

>
> Thanks.
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next v2] net: virtio: use eth_hw_addr_set()

2021-10-27 Thread Jason Wang
On Wed, Oct 27, 2021 at 11:31 PM Michael S. Tsirkin  wrote:
>
> On Wed, Oct 27, 2021 at 08:20:12AM -0700, Jakub Kicinski wrote:
> > Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
> > of VLANs...") introduced a rbtree for faster Ethernet address look
> > up. To maintain netdev->dev_addr in this tree we need to make all
> > the writes to it go through appropriate helpers.
> >
> > Even though the current code uses dev->addr_len the we can switch
> > to eth_hw_addr_set() instead of dev_addr_set(). The netdev is
> > always allocated by alloc_etherdev_mq() and there are at least two
> > places which assume Ethernet address:
> >  - the line below calling eth_hw_addr_random()
> >  - virtnet_set_mac_address() -> eth_commit_mac_addr_change()
> >
> > Signed-off-by: Jakub Kicinski 
>
> Acked-by: Michael S. Tsirkin 

Acked-by: Jason Wang 

>
> > ---
> > v2: - actually switch to eth_hw_addr_set() not dev_addr_set()
> > - resize the buffer to ETH_ALEN
> > - pass ETH_ALEN instead of dev->dev_addr to virtio_cread_bytes()
> >
> > CC: m...@redhat.com
> > CC: jasow...@redhat.com
> > CC: virtualization@lists.linux-foundation.org
> > ---
> >  drivers/net/virtio_net.c | 10 +++---
> >  1 file changed, 7 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index c501b5974aee..cc79343cd220 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -3177,12 +3177,16 @@ static int virtnet_probe(struct virtio_device *vdev)
> >   dev->max_mtu = MAX_MTU;
> >
> >   /* Configuration may specify what MAC to use.  Otherwise random. */
> > - if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC))
> > + if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC)) {
> > + u8 addr[ETH_ALEN];
> > +
> >   virtio_cread_bytes(vdev,
> >  offsetof(struct virtio_net_config, mac),
> > -dev->dev_addr, dev->addr_len);
> > - else
> > +addr, ETH_ALEN);
> > + eth_hw_addr_set(dev, addr);
> > + } else {
> >   eth_hw_addr_random(dev);
> > + }
> >
> >   /* Set up our device-specific information */
> >   vi = netdev_priv(dev);
> > --
> > 2.31.1
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: vDPA bus driver selection

2021-10-27 Thread Parav Pandit via Virtualization


> From: Michael S. Tsirkin 
> Sent: Thursday, October 28, 2021 1:46 AM
> 
> On Wed, Oct 27, 2021 at 03:21:15PM +, Parav Pandit wrote:
> > Hi Stefano,
> >
> > > From: Stefano Garzarella 
> > > Sent: Wednesday, October 27, 2021 8:04 PM
> > >
> > > Hi folks,
> > > I was trying to understand if we have a way to specify which vDPA
> > > bus driver (e.g. vhost-vdpa, virtio-vdpa) a device should use.
> > > IIUC we don't have it, and the first registered driver is used when
> > > a new device is registered.
> > >
> > > I was thinking if it makes sense to extend the management API to
> > > specify which bus driver to use for a device. A use case could be
> > > for example a single host handling VMs and bare-metal containers, so
> > > we would have both virtio-vdpa and vhost-vdpa loaded and we want to
> > > attach some devices to VMs through vhost-vdpa and others to containers
> through virtio-vdpa.
> > >
> > > What do you think?
> > >
> > One option is, user keeps the drivers_autoprobe disabled for the vdpa
> > bus using,
> >
> > $ vdpa/vdpa dev add mgmtdev vdpasim_net name vdpa0 mac
> > 00:11:22:33:44:55 $ echo 0 > /sys/bus/vdpa/drivers_autoprobe
> >
> > And after vdpa device creation, it manually binds to the desired
> > driver such as,
> >
> > $ echo vdpa0 > /sys/bus/vdpa/drivers/virtio_vdpa/bind
> > Or
> > $ echo vdpa0 > /sys/bus/vdpa/drivers/vhost_vdpa/bind
> >
> > In an case of VDUSE, it makes more sense to bind to the one of the above
> driver after user space has connected the use space backend to the kernel
> device.
> 
> The only annoying thing is that manual bind is not validated.
> E.g. if one makes a mistake and binds an incorrect device, it just tends to 
> crash
> IIRC.
Only a vdpa device can be bind/unbind to a vdpa bus driver.
Such checks are done by the kernel core.
I didn't follow when can it crash. Can you please share an example when can it 
crash?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/3] virtio: cache indirect desc for packed

2021-10-27 Thread kernel test robot
Hi Xuan,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on horms-ipvs/master]
[also build test WARNING on linus/master v5.15-rc7]
[cannot apply to mst-vhost/linux-next next-20211027]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025
base:   https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs.git master
config: i386-randconfig-s002-20211027 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce:
# apt-get install sparse
# sparse version: v0.6.4-dirty
# 
https://github.com/0day-ci/linux/commit/bb65ceda850ed4592d8a940e01926d5e3d33ae92
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025
git checkout bb65ceda850ed4592d8a940e01926d5e3d33ae92
# save the attached .config to linux build tree
make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir 
ARCH=i386 SHELL=/bin/bash drivers/virtio/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 


sparse warnings: (new ones prefixed by >>)
   drivers/virtio/virtio_ring.c:438:26: sparse: sparse: cast from restricted 
__virtio64
   drivers/virtio/virtio_ring.c:447:28: sparse: sparse: incorrect type in 
assignment (different base types) @@ expected restricted __virtio64 
[usertype] addr @@ got unsigned long long [usertype] @@
   drivers/virtio/virtio_ring.c:447:28: sparse: expected restricted 
__virtio64 [usertype] addr
   drivers/virtio/virtio_ring.c:447:28: sparse: got unsigned long long 
[usertype]
   drivers/virtio/virtio_ring.c:464:49: sparse: sparse: cast from restricted 
__virtio64
>> drivers/virtio/virtio_ring.c:1083:26: sparse: sparse: cast from restricted 
>> __le64
>> drivers/virtio/virtio_ring.c:1092:28: sparse: sparse: incorrect type in 
>> assignment (different base types) @@ expected restricted __le64 
>> [usertype] addr @@ got unsigned long long [usertype] @@
   drivers/virtio/virtio_ring.c:1092:28: sparse: expected restricted __le64 
[usertype] addr
   drivers/virtio/virtio_ring.c:1092:28: sparse: got unsigned long long 
[usertype]
   drivers/virtio/virtio_ring.c:1109:49: sparse: sparse: cast from restricted 
__le64

vim +1083 drivers/virtio/virtio_ring.c

  1076  
  1077  static void desc_cache_chain_free_packed(void *chain)
  1078  {
  1079  struct vring_packed_desc *desc;
  1080  
  1081  while (chain) {
  1082  desc = chain;
> 1083  chain = (void *)desc->addr;
  1084  kfree(desc);
  1085  }
  1086  }
  1087  
  1088  static void desc_cache_put_packed(struct vring_virtqueue *vq,
  1089struct vring_packed_desc *desc, int n)
  1090  {
  1091  if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) {
> 1092  desc->addr = (u64)vq->desc_cache_chain;
  1093  vq->desc_cache_chain = desc;
  1094  } else {
  1095  kfree(desc);
  1096  }
  1097  }
  1098  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC] Add DMA_API support for Virtio devices earlier than VirtIO 1.0

2021-10-27 Thread Michael S. Tsirkin
On Wed, Oct 27, 2021 at 04:28:28PM -0700, Erdem Aktas wrote:
> Enable DMA_API for any VirtIO device earlier than Virtio 1.0 which
> is the only way for those devices to be configured correctly when
> memory access is retricted.
> 
> Virtio devices can use DMA_API to translate guest phsical addresses to
> device physical addresses if VIRTIO_F_ACCESS_PLATFORM feature is set
> while the device is being initialized. VIRTIO_F_ACCESS_PLATFORM
> feature is only supported in VirtIO 1.0 and later devices. This prevents
> any device using an earlier VirtIO version than Virtio 1.0 to be
> attached when memory access is restricted ie memory encryption features
> (AMD SEV [ES/SNP], Intel TDX, etc..) are enabled.
> 
> Signed-off-by: Erdem Aktas 


Sorry .. NACK.

Virtio before 1.0 is on life support. No new features. Just use 1.0
please.


> ---
> I have tested the this patch using linux-stable.git head, 5.15.0-rc6
> kernel and scsi disk with virtio 0.95 version with legacy VM and
> Confidential VM (AMD SEV). I want to get feedback if
> there is any risk or downside of enabling DMA_API on older virtio
> drivers when memory encrytion is enabled.
> 
>  drivers/virtio/virtio.c   |  7 ++-
>  include/linux/virtio_config.h | 22 ++
>  2 files changed, 16 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 236081afe9a2..71115ba85d07 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -179,11 +179,8 @@ int virtio_finalize_features(struct virtio_device *dev)
>   if (ret) {
>   if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1)) {
>   dev_warn(&dev->dev,
> -  "device must provide VIRTIO_F_VERSION_1\n");
> - return -ENODEV;
> - }
> -
> - if (!virtio_has_feature(dev, VIRTIO_F_ACCESS_PLATFORM)) {
> +  "device does not provide VIRTIO_F_VERSION_1 
> while restricted memory access is enabled!.\n");
> + } else if (!virtio_has_feature(dev, VIRTIO_F_ACCESS_PLATFORM)) {
>   dev_warn(&dev->dev,
>"device must provide 
> VIRTIO_F_ACCESS_PLATFORM\n");
>   return -ENODEV;
> diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> index 8519b3ae5d52..6eacb4d43318 100644
> --- a/include/linux/virtio_config.h
> +++ b/include/linux/virtio_config.h
> @@ -170,6 +170,15 @@ static inline bool virtio_has_feature(const struct 
> virtio_device *vdev,
>   return __virtio_test_bit(vdev, fbit);
>  }
>  
> +#ifdef CONFIG_ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS
> +int arch_has_restricted_virtio_memory_access(void);
> +#else
> +static inline int arch_has_restricted_virtio_memory_access(void)
> +{
> + return 0;
> +}
> +#endif /* CONFIG_ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS */
> +
>  /**
>   * virtio_has_dma_quirk - determine whether this device has the DMA quirk
>   * @vdev: the device
> @@ -180,6 +189,11 @@ static inline bool virtio_has_dma_quirk(const struct 
> virtio_device *vdev)
>* Note the reverse polarity of the quirk feature (compared to most
>* other features), this is for compatibility with legacy systems.
>*/
> + if (!virtio_has_feature(vdev, VIRTIO_F_VERSION_1) &&
> +arch_has_restricted_virtio_memory_access())
> + return false;
> +
> +
>   return !virtio_has_feature(vdev, VIRTIO_F_ACCESS_PLATFORM);
>  }
>  
> @@ -558,13 +572,5 @@ static inline void virtio_cwrite64(struct virtio_device 
> *vdev,
>   _r; \
>   })
>  
> -#ifdef CONFIG_ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS
> -int arch_has_restricted_virtio_memory_access(void);
> -#else
> -static inline int arch_has_restricted_virtio_memory_access(void)
> -{
> - return 0;
> -}
> -#endif /* CONFIG_ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS */
>  
>  #endif /* _LINUX_VIRTIO_CONFIG_H */
> -- 
> 2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/3] virtio: cache indirect desc for split

2021-10-27 Thread Xuan Zhuo
On Thu, 28 Oct 2021 10:16:10 +0800, Jason Wang  wrote:
> On Thu, Oct 28, 2021 at 1:07 AM Michael S. Tsirkin  wrote:
> >
> > On Wed, Oct 27, 2021 at 02:19:11PM +0800, Xuan Zhuo wrote:
> > > In the case of using indirect, indirect desc must be allocated and
> > > released each time, which increases a lot of cpu overhead.
> > >
> > > Here, a cache is added for indirect. If the number of indirect desc to be
> > > applied for is less than VIRT_QUEUE_CACHE_DESC_NUM, the desc array with
> > > the size of VIRT_QUEUE_CACHE_DESC_NUM is fixed and cached for reuse.
> > >
> > > Signed-off-by: Xuan Zhuo 
> > > ---
> > >  drivers/virtio/virtio.c  |  6 
> > >  drivers/virtio/virtio_ring.c | 63 ++--
> > >  include/linux/virtio.h   | 10 ++
> > >  3 files changed, 70 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > index 0a5b54034d4b..04bcb74e5b9a 100644
> > > --- a/drivers/virtio/virtio.c
> > > +++ b/drivers/virtio/virtio.c
> > > @@ -431,6 +431,12 @@ bool is_virtio_device(struct device *dev)
> > >  }
> > >  EXPORT_SYMBOL_GPL(is_virtio_device);
> > >
> > > +void virtio_use_desc_cache(struct virtio_device *dev, bool val)
> > > +{
> > > + dev->desc_cache = val;
> > > +}
> > > +EXPORT_SYMBOL_GPL(virtio_use_desc_cache);
> > > +
> > >  void unregister_virtio_device(struct virtio_device *dev)
> > >  {
> > >   int index = dev->index; /* save for after device release */
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index dd95dfd85e98..0b9a8544b0e8 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -117,6 +117,10 @@ struct vring_virtqueue {
> > >   /* Hint for event idx: already triggered no need to disable. */
> > >   bool event_triggered;
> > >
> > > + /* Is indirect cache used? */
> > > + bool use_desc_cache;
> > > + void *desc_cache_chain;
> > > +
> > >   union {
> > >   /* Available for split ring */
> > >   struct {
> > > @@ -423,12 +427,47 @@ static unsigned int vring_unmap_one_split(const 
> > > struct vring_virtqueue *vq,
> > >   return extra[i].next;
> > >  }
> > >
> > > -static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
> > > +#define VIRT_QUEUE_CACHE_DESC_NUM 4
> > > +
> > > +static void desc_cache_chain_free_split(void *chain)
> > > +{
> > > + struct vring_desc *desc;
> > > +
> > > + while (chain) {
> > > + desc = chain;
> > > + chain = (void *)desc->addr;
> > > + kfree(desc);
> > > + }
> > > +}
> > > +
> > > +static void desc_cache_put_split(struct vring_virtqueue *vq,
> > > +  struct vring_desc *desc, int n)
> > > +{
> > > + if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) {
> > > + desc->addr = (u64)vq->desc_cache_chain;
> > > + vq->desc_cache_chain = desc;
> > > + } else {
> > > + kfree(desc);
> > > + }
> > > +}
> > > +
> >
> >
> > So I have a question here. What happens if we just do:
> >
> > if (n <= VIRT_QUEUE_CACHE_DESC_NUM) {
> > return kmem_cache_alloc(VIRT_QUEUE_CACHE_DESC_NUM * sizeof desc, 
> > gfp)
> > } else {
> > return kmalloc_arrat(n, sizeof desc, gfp)
> > }
> >
> > A small change and won't we reap most performance benefits?
>
> Yes, I think we need a benchmark to use private cache to see how much
> it can help.

I did a test, the code is as follows:

+static void desc_cache_put_packed(struct vring_virtqueue *vq,
+  struct vring_packed_desc *desc, int n)
+ {
+   if (n <= VIRT_QUEUE_CACHE_DESC_NUM) {
+   kmem_cache_free(vq->desc_cache, desc);
+   } else {
+   kfree(desc);
+   }


@@ -476,11 +452,14 @@ static struct vring_desc *alloc_indirect_split(struct 
vring_virtqueue *vq,
 */
gfp &= ~__GFP_HIGHMEM;

-   desc = kmalloc_array(n, sizeof(struct vring_desc), gfp);
+   if (total_sg <= VIRT_QUEUE_CACHE_DESC_NUM)
+   desc = kmem_cache_alloc(vq->desc_cache, gfp);
+   else
+   desc = kmalloc_array(total_sg, sizeof(struct vring_desc), gfp);
+

...

+   vq->desc_cache = kmem_cache_create("virio_desc",
+  4 * sizeof(struct vring_desc),
+  0, 0, NULL);

The effect is not good, basically there is no improvement, using perf top can
see that the overhead of kmem_cache_alloc/kmem_cache_free is also relatively
large:

 26.91%  [kernel]  [k] virtqueue_add
 15.35%  [kernel]  [k] detach_buf_split
 14.15%  [kernel]  [k] virtnet_xsk_xmit
 13.24%  [kernel]  [k] virtqueue_add_outbuf
   >  9.30%  [kernel]  [k] __slab_free
   >  3.91%  [kernel]  [k] kmem_cache_alloc
  2.85%  [kernel]  [k] virtqueue_get_buf_ctx
   >  2.82%  [kernel]  [k] kmem_cache_