Re: [PATCH net-next] net: virtio: use eth_hw_addr_set()
On Tue, Oct 26, 2021 at 10:56:34AM -0700, Jakub Kicinski wrote: > Commit 406f42fa0d3c ("net-next: When a bond have a massive amount > of VLANs...") introduced a rbtree for faster Ethernet address look > up. To maintain netdev->dev_addr in this tree we need to make all > the writes to it go through appropriate helpers. > > Signed-off-by: Jakub Kicinski Acked-by: Michael S. Tsirkin > --- > CC: m...@redhat.com > CC: jasow...@redhat.com > CC: virtualization@lists.linux-foundation.org > --- > drivers/net/virtio_net.c | 10 +++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index c501b5974aee..b7f35aff8e82 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -3177,12 +3177,16 @@ static int virtnet_probe(struct virtio_device *vdev) > dev->max_mtu = MAX_MTU; > > /* Configuration may specify what MAC to use. Otherwise random. */ > - if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC)) > + if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC)) { > + u8 addr[MAX_ADDR_LEN]; > + > virtio_cread_bytes(vdev, > offsetof(struct virtio_net_config, mac), > -dev->dev_addr, dev->addr_len); > - else > +addr, dev->addr_len); Maybe BUG_ON(dev->addr_len > sizeof addr); here just to make sure we don't overflow addr silently? > + dev_addr_set(dev, addr); > + } else { > eth_hw_addr_random(dev); > + } > > /* Set up our device-specific information */ > vi = netdev_priv(dev); > -- > 2.31.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next] net: virtio: use eth_hw_addr_set()
On Wed, Oct 27, 2021 at 10:45 AM Jason Wang wrote: > > On Wed, Oct 27, 2021 at 1:56 AM Jakub Kicinski wrote: > > > > Commit 406f42fa0d3c ("net-next: When a bond have a massive amount > > of VLANs...") introduced a rbtree for faster Ethernet address look > > up. To maintain netdev->dev_addr in this tree we need to make all > > the writes to it go through appropriate helpers. > > I think the title should be "net: virtio: use eth_hw_addr_set()" I meant "dev_addr_set()" actually. Thanks > > > > > Signed-off-by: Jakub Kicinski > > --- > > CC: m...@redhat.com > > CC: jasow...@redhat.com > > CC: virtualization@lists.linux-foundation.org > > --- > > drivers/net/virtio_net.c | 10 +++--- > > 1 file changed, 7 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > index c501b5974aee..b7f35aff8e82 100644 > > --- a/drivers/net/virtio_net.c > > +++ b/drivers/net/virtio_net.c > > @@ -3177,12 +3177,16 @@ static int virtnet_probe(struct virtio_device *vdev) > > dev->max_mtu = MAX_MTU; > > > > /* Configuration may specify what MAC to use. Otherwise random. */ > > - if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC)) > > + if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC)) { > > + u8 addr[MAX_ADDR_LEN]; > > + > > virtio_cread_bytes(vdev, > >offsetof(struct virtio_net_config, mac), > > - dev->dev_addr, dev->addr_len); > > - else > > + addr, dev->addr_len); > > + dev_addr_set(dev, addr); > > + } else { > > eth_hw_addr_random(dev); > > + } > > Do we need to change virtnet_set_mac_address() as well? > > Thanks > > > > > /* Set up our device-specific information */ > > vi = netdev_priv(dev); > > -- > > 2.31.1 > > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH linux-next v7 4/8] vdpa: Enable user to set mac and mtu of vdpa device
On Tue, Oct 26, 2021 at 08:55:15PM +0300, Parav Pandit via Virtualization wrote: $ vdpa dev add name bar mgmtdev vdpasim_net mac 00:11:22:33:44:55 mtu 9000 $ vdpa dev config show bar: mac 00:11:22:33:44:55 link up link_announce false mtu 9000 $ vdpa dev config show -jp { "config": { "bar": { "mac": "00:11:22:33:44:55", "link ": "up", "link_announce ": false, "mtu": 9000, } } } Signed-off-by: Parav Pandit Reviewed-by: Eli Cohen Acked-by: Jason Wang --- changelog: v5->v6: - fixed typo of device v4->v5: - added comment for checking device capabilities v3->v4: - provide config attributes during device addition time --- drivers/vdpa/ifcvf/ifcvf_main.c | 3 ++- drivers/vdpa/mlx5/net/mlx5_vnet.c| 3 ++- drivers/vdpa/vdpa.c | 38 ++-- drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 3 ++- drivers/vdpa/vdpa_sim/vdpa_sim_net.c | 3 ++- drivers/vdpa/vdpa_user/vduse_dev.c | 3 ++- include/linux/vdpa.h | 17 - 7 files changed, 62 insertions(+), 8 deletions(-) Reviewed-by: Stefano Garzarella ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next] net: virtio: use eth_hw_addr_set()
On Wed, Oct 27, 2021 at 03:24:55PM +0800, Jason Wang wrote: > On Wed, Oct 27, 2021 at 10:45 AM Jason Wang wrote: > > > > On Wed, Oct 27, 2021 at 1:56 AM Jakub Kicinski wrote: > > > > > > Commit 406f42fa0d3c ("net-next: When a bond have a massive amount > > > of VLANs...") introduced a rbtree for faster Ethernet address look > > > up. To maintain netdev->dev_addr in this tree we need to make all > > > the writes to it go through appropriate helpers. > > > > I think the title should be "net: virtio: use eth_hw_addr_set()" > > I meant "dev_addr_set()" actually. > > Thanks Good point, this confused me too. Could be fixed up when applying? > > > > > > > > Signed-off-by: Jakub Kicinski > > > --- > > > CC: m...@redhat.com > > > CC: jasow...@redhat.com > > > CC: virtualization@lists.linux-foundation.org > > > --- > > > drivers/net/virtio_net.c | 10 +++--- > > > 1 file changed, 7 insertions(+), 3 deletions(-) > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > index c501b5974aee..b7f35aff8e82 100644 > > > --- a/drivers/net/virtio_net.c > > > +++ b/drivers/net/virtio_net.c > > > @@ -3177,12 +3177,16 @@ static int virtnet_probe(struct virtio_device > > > *vdev) > > > dev->max_mtu = MAX_MTU; > > > > > > /* Configuration may specify what MAC to use. Otherwise random. > > > */ > > > - if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC)) > > > + if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC)) { > > > + u8 addr[MAX_ADDR_LEN]; > > > + > > > virtio_cread_bytes(vdev, > > >offsetof(struct virtio_net_config, > > > mac), > > > - dev->dev_addr, dev->addr_len); > > > - else > > > + addr, dev->addr_len); > > > + dev_addr_set(dev, addr); > > > + } else { > > > eth_hw_addr_random(dev); > > > + } > > > > Do we need to change virtnet_set_mac_address() as well? > > > > Thanks > > > > > > > > /* Set up our device-specific information */ > > > vi = netdev_priv(dev); > > > -- > > > 2.31.1 > > > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/3] virtio: cache indirect desc for split
On Wed, Oct 27, 2021 at 02:19:11PM +0800, Xuan Zhuo wrote: > In the case of using indirect, indirect desc must be allocated and > released each time, which increases a lot of cpu overhead. > > Here, a cache is added for indirect. If the number of indirect desc to be > applied for is less than VIRT_QUEUE_CACHE_DESC_NUM, the desc array with > the size of VIRT_QUEUE_CACHE_DESC_NUM is fixed and cached for reuse. > > Signed-off-by: Xuan Zhuo > --- > drivers/virtio/virtio.c | 6 > drivers/virtio/virtio_ring.c | 63 ++-- > include/linux/virtio.h | 10 ++ > 3 files changed, 70 insertions(+), 9 deletions(-) > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c > index 0a5b54034d4b..04bcb74e5b9a 100644 > --- a/drivers/virtio/virtio.c > +++ b/drivers/virtio/virtio.c > @@ -431,6 +431,12 @@ bool is_virtio_device(struct device *dev) > } > EXPORT_SYMBOL_GPL(is_virtio_device); > > +void virtio_use_desc_cache(struct virtio_device *dev, bool val) > +{ > + dev->desc_cache = val; > +} > +EXPORT_SYMBOL_GPL(virtio_use_desc_cache); > + > void unregister_virtio_device(struct virtio_device *dev) > { > int index = dev->index; /* save for after device release */ > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c > index dd95dfd85e98..0b9a8544b0e8 100644 > --- a/drivers/virtio/virtio_ring.c > +++ b/drivers/virtio/virtio_ring.c > @@ -117,6 +117,10 @@ struct vring_virtqueue { > /* Hint for event idx: already triggered no need to disable. */ > bool event_triggered; > > + /* Is indirect cache used? */ > + bool use_desc_cache; > + void *desc_cache_chain; > + > union { > /* Available for split ring */ > struct { Let's use llist_head and friends please (I am guessing we want single-linked to avoid writing into indirect buffer after use, invalidating the cache, but please document why in a comment). Do not open-code it. Also hide all casts in inline wrappers. > @@ -423,12 +427,47 @@ static unsigned int vring_unmap_one_split(const struct > vring_virtqueue *vq, > return extra[i].next; > } > > -static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq, > +#define VIRT_QUEUE_CACHE_DESC_NUM 4 Add an inline wrapper for checking sg versus VIRT_QUEUE_CACHE_DESC_NUM. > + > +static void desc_cache_chain_free_split(void *chain) > +{ > + struct vring_desc *desc; > + > + while (chain) { > + desc = chain; > + chain = (void *)desc->addr; > + kfree(desc); > + } > +} > + > +static void desc_cache_put_split(struct vring_virtqueue *vq, > + struct vring_desc *desc, int n) > +{ > + if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) { > + desc->addr = (u64)vq->desc_cache_chain; > + vq->desc_cache_chain = desc; > + } else { > + kfree(desc); > + } > +} > + > +static struct vring_desc *alloc_indirect_split(struct vring_virtqueue *vq, > unsigned int total_sg, > gfp_t gfp) > { > struct vring_desc *desc; > - unsigned int i; > + unsigned int i, n; > + > + if (vq->use_desc_cache && total_sg <= VIRT_QUEUE_CACHE_DESC_NUM) { > + if (vq->desc_cache_chain) { > + desc = vq->desc_cache_chain; > + vq->desc_cache_chain = (void *)desc->addr; > + goto got; > + } > + n = VIRT_QUEUE_CACHE_DESC_NUM; Hmm. This will allocate more entries than actually used. Why do it? > + } else { > + n = total_sg; > + } > > /* >* We require lowmem mappings for the descriptors because > @@ -437,12 +476,13 @@ static struct vring_desc *alloc_indirect_split(struct > virtqueue *_vq, >*/ > gfp &= ~__GFP_HIGHMEM; > > - desc = kmalloc_array(total_sg, sizeof(struct vring_desc), gfp); > + desc = kmalloc_array(n, sizeof(struct vring_desc), gfp); > if (!desc) > return NULL; > > +got: > for (i = 0; i < total_sg; i++) > - desc[i].next = cpu_to_virtio16(_vq->vdev, i + 1); > + desc[i].next = cpu_to_virtio16(vq->vq.vdev, i + 1); > return desc; > } > > @@ -508,7 +548,7 @@ static inline int virtqueue_add_split(struct virtqueue > *_vq, > head = vq->free_head; > > if (virtqueue_use_indirect(_vq, total_sg)) > - desc = alloc_indirect_split(_vq, total_sg, gfp); > + desc = alloc_indirect_split(vq, total_sg, gfp); > else { > desc = NULL; > WARN_ON_ONCE(total_sg > vq->split.vring.num && !vq->indirect); > @@ -652,7 +692,7 @@ static inline int virtqueue_add_split(struct virtqueue > *_vq, > } > > if (indirect) > - kfree(desc); > + desc_cache_put_split(vq, desc, total_sg);
Re: [PATCH V3 11/11] vhost: allow userspace to create workers
On Wed, Oct 27, 2021 at 10:55:04AM +0800, Jason Wang wrote: > On Tue, Oct 26, 2021 at 11:45 PM Stefan Hajnoczi wrote: > > > > On Tue, Oct 26, 2021 at 01:37:14PM +0800, Jason Wang wrote: > > > > > > 在 2021/10/22 下午1:19, Mike Christie 写道: > > > > This patch allows userspace to create workers and bind them to vqs. You > > > > can have N workers per dev and also share N workers with M vqs. > > > > > > > > Signed-off-by: Mike Christie > > > > > > > > > A question, who is the best one to determine the binding? Is it the VMM > > > (Qemu etc) or the management stack? If the latter, it looks to me it's > > > better to expose this via sysfs? > > > > A few options that let the management stack control vhost worker CPU > > affinity: > > > > 1. The management tool opens the vhost device node, calls > >ioctl(VHOST_SET_VRING_WORKER), sets up CPU affinity, and then passes > >the fd to the VMM. In this case the VMM is still able to call the > >ioctl, which may be undesirable from an attack surface perspective. > > Yes, and we can't do post or dynamic configuration afterwards after > the VM is launched? Yes, at least it's a little risky for the management stack to keep the vhost fd open and make ioctl calls while the VMM is using it. > > > > 2. The VMM calls ioctl(VHOST_SET_VRING_WORKER) itself and the management > >tool queries the vq:worker details from the VMM (e.g. a new QEMU QMP > >query-vhost-workers command similar to query-iothreads). The > >management tool can then control CPU affinity on the vhost worker > >threads. > > > >(This is how CPU affinity works in QEMU and libvirt today.) > > Then we also need a "bind-vhost-workers" command. The VMM doesn't but the management tool does. Stefan signature.asc Description: PGP signature ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V3 11/11] vhost: allow userspace to create workers
On Tue, Oct 26, 2021 at 11:49:37AM -0500, michael.chris...@oracle.com wrote: > On 10/26/21 12:37 AM, Jason Wang wrote: > > Do we need VHOST_VRING_FREE_WORKER? And I wonder if using dedicated ioctls > > are better: > > > > VHOST_VRING_NEW/FREE_WORKER > > VHOST_VRING_ATTACH_WORKER > > > We didn't need a free worker, because the kernel handles it for userspace. I > tried to make it easy for userspace because in some cases it may not be able > to do syscalls like close on the device. For example if qemu crashes or for > vhost-scsi we don't do an explicit close during VM shutdown. > > So we start off with the default worker thread that's used by all vqs like we > do > today. Userspace can then override it by creating a new worker. That also > unbinds/ > detaches the existing worker and does a put on the workers refcount. We also > do a > put on the worker when we stop using it during device > shutdown/closure/release. > When the worker's refcount goes to zero the kernel deletes it. Please document the worker (p)id lifetime for the ioctl. Otherwise userspace doesn't know whether a previously created worker is still alive. SSTefan signature.asc Description: PGP signature ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/3] virtio: cache indirect desc for split
On Wed, 27 Oct 2021 04:55:16 -0400, Michael S. Tsirkin wrote: > On Wed, Oct 27, 2021 at 02:19:11PM +0800, Xuan Zhuo wrote: > > In the case of using indirect, indirect desc must be allocated and > > released each time, which increases a lot of cpu overhead. > > > > Here, a cache is added for indirect. If the number of indirect desc to be > > applied for is less than VIRT_QUEUE_CACHE_DESC_NUM, the desc array with > > the size of VIRT_QUEUE_CACHE_DESC_NUM is fixed and cached for reuse. > > > > Signed-off-by: Xuan Zhuo > > --- > > drivers/virtio/virtio.c | 6 > > drivers/virtio/virtio_ring.c | 63 ++-- > > include/linux/virtio.h | 10 ++ > > 3 files changed, 70 insertions(+), 9 deletions(-) > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c > > index 0a5b54034d4b..04bcb74e5b9a 100644 > > --- a/drivers/virtio/virtio.c > > +++ b/drivers/virtio/virtio.c > > @@ -431,6 +431,12 @@ bool is_virtio_device(struct device *dev) > > } > > EXPORT_SYMBOL_GPL(is_virtio_device); > > > > +void virtio_use_desc_cache(struct virtio_device *dev, bool val) > > +{ > > + dev->desc_cache = val; > > +} > > +EXPORT_SYMBOL_GPL(virtio_use_desc_cache); > > + > > void unregister_virtio_device(struct virtio_device *dev) > > { > > int index = dev->index; /* save for after device release */ > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c > > index dd95dfd85e98..0b9a8544b0e8 100644 > > --- a/drivers/virtio/virtio_ring.c > > +++ b/drivers/virtio/virtio_ring.c > > @@ -117,6 +117,10 @@ struct vring_virtqueue { > > /* Hint for event idx: already triggered no need to disable. */ > > bool event_triggered; > > > > + /* Is indirect cache used? */ > > + bool use_desc_cache; > > + void *desc_cache_chain; > > + > > union { > > /* Available for split ring */ > > struct { > > Let's use llist_head and friends please (I am guessing we want > single-linked to avoid writing into indirect buffer after use, > invalidating the cache, but please document why in a comment). Do not > open-code it. > > Also hide all casts in inline wrappers. > > > > @@ -423,12 +427,47 @@ static unsigned int vring_unmap_one_split(const > > struct vring_virtqueue *vq, > > return extra[i].next; > > } > > > > -static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq, > > +#define VIRT_QUEUE_CACHE_DESC_NUM 4 > > Add an inline wrapper for checking sg versus VIRT_QUEUE_CACHE_DESC_NUM. > > > + > > +static void desc_cache_chain_free_split(void *chain) > > +{ > > + struct vring_desc *desc; > > + > > + while (chain) { > > + desc = chain; > > + chain = (void *)desc->addr; > > + kfree(desc); > > + } > > +} > > + > > +static void desc_cache_put_split(struct vring_virtqueue *vq, > > +struct vring_desc *desc, int n) > > +{ > > + if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) { > > + desc->addr = (u64)vq->desc_cache_chain; > > + vq->desc_cache_chain = desc; > > + } else { > > + kfree(desc); > > + } > > +} > > > > + > > +static struct vring_desc *alloc_indirect_split(struct vring_virtqueue *vq, > >unsigned int total_sg, > >gfp_t gfp) > > { > > struct vring_desc *desc; > > - unsigned int i; > > + unsigned int i, n; > > + > > + if (vq->use_desc_cache && total_sg <= VIRT_QUEUE_CACHE_DESC_NUM) { > > + if (vq->desc_cache_chain) { > > + desc = vq->desc_cache_chain; > > + vq->desc_cache_chain = (void *)desc->addr; > > + goto got; > > + } > > + n = VIRT_QUEUE_CACHE_DESC_NUM; > > Hmm. This will allocate more entries than actually used. Why do it? If total_sg here is 2, we only allocate two desc here, but if total_sg is 3 later, it will be difficult to use the desc allocated this time. So if total_sg is less than or equal to 4, a desc array of size 4 is allocated, so that subsequent desc arrays can be reused when total_sg is less than or equal to 4. If total_sg is greater than 4, use kmalloc_array to allocate directly without using the cache. Thanks. > > > + } else { > > + n = total_sg; > > + } > > > > /* > > * We require lowmem mappings for the descriptors because > > @@ -437,12 +476,13 @@ static struct vring_desc *alloc_indirect_split(struct > > virtqueue *_vq, > > */ > > gfp &= ~__GFP_HIGHMEM; > > > > - desc = kmalloc_array(total_sg, sizeof(struct vring_desc), gfp); > > + desc = kmalloc_array(n, sizeof(struct vring_desc), gfp); > > if (!desc) > > return NULL; > > > > +got: > > for (i = 0; i < total_sg; i++) > > - desc[i].next = cpu_to_virtio16(_vq->vdev, i + 1); > > + desc[i].next = cpu_to_virtio16(vq->vq.vdev, i + 1); > > return desc; > > } > > > > @@ -508,7 +548,
Re: drm/virtio: not pin pages on demand
[ Cc'ing Gurchetan Singh ] > Can we follow up on this issue? > > The main pain point with your suggestion is the fact, > that it will cause VirGL protocol breakage and we would > like to avoid this. > > Extending execbuffer ioctl and create_resource ioctl is > more convenient than having the protocol broken. Do you know at resource creation time whenever you need cpu access or not? IOW can we make that a resource property, so we don't have pass around lists of objects on each and every execbuffer ioctl? > Blob resources is not a solution, too, because QEMU doesn't > support blob resources right now. > > In ideal solution when both QEMU and crosvm support blob resources > we can switch to blob resources for textures. That'll only happen if someone works on it. I will not be able to do that though. > I would like to introduce changes gradually and not make a protocol > breakage. Well, opengl switching to blob resources is a protocol change anyway. That doesn't imply things will break though. We have capsets etc to extend the protocol while maintaining backward compatibility. > What do you think about that? I still think that switching to blob resources would be the better solution. Yes, it's alot of work and not something which helps short-term. But adding a new API as interim solution isn't great either. So, not sure. Chia-I Wu? Gurchetan Singh? While being at it: Chia-I Wu & Gurchetan Singh, could you help reviewing virtio-gpu kernel patches? I think you both have a better view on the big picture (with both mesa and virglrenderer) than I have. Also for the crosvm side of things. The procedure for that would be to send a patch adding yourself to the virtio-gpu section of the MAINTAINERS file, so scripts/get_maintainer.pl will Cc you on patches submitted. thanks, Gerd > > Maksym > > > On 10/22/21 10:44 AM, Maksym Wezdecki wrote: > > > Once again with all lists and receivers. I'm sorry for that. > > > > On 10/21/21 6:42 PM, Chia-I Wu wrote: > >> On Thu, Oct 21, 2021 at 4:52 AM Gerd Hoffmann wrote: > >>> On Thu, Oct 21, 2021 at 11:55:47AM +0200, Maksym Wezdecki wrote: > I get your point. However, we need to make resource_create ioctl, > in order to create corresponding resource on the host. > >>> That used to be the case but isn't true any more with the new > >>> blob resources. virglrenderer allows to create gpu objects > >>> via execbuffer. Those gpu objects can be linked to a > >>> virtio-gpu resources, but it's optional. In your case you > >>> would do that only for your staging buffer. The other textures > >>> (which you fill with a host-side copy from the staging buffer) > >>> do not need a virtio-gpu resource in the first place. > >> That's however a breaking change to the virgl protocol. All virgl > >> commands expect res ids rather than blob ids. > >> > >> Using VIRTGPU_BLOB_MEM_HOST3D could in theory work. But there are a > >> few occasions where virglrenderer expects there to be guest storage. > >> There are also readbacks that we need to support. We might end up > >> using VIRTGPU_BLOB_MEM_HOST3D_GUEST, where pin-on-demand is still > >> desirable. > >> > >> For this patch, I think the uapi change can be simplified. It can be > >> a new param plus a new field in drm_virtgpu_execbuffer > >> > >> __u64 bo_flags; /* pointer to an array of size num_bo_handles, or NULL */ > >> > >> The other changes do not seem needed. > > My first approach was the same, as you mentioned. However, having "__u64 > > bo_flags" > > needs a for loop, where only few of the bo-s needs to be pinned - this has > > performance implications. I would rather pass bo handles that should be > > pinned than > > having a lot of flags, where only 1-2 bos needs to be pinned. > > > >>> take care, > >>> Gerd > >>> -- ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
vDPA bus driver selection
Hi folks, I was trying to understand if we have a way to specify which vDPA bus driver (e.g. vhost-vdpa, virtio-vdpa) a device should use. IIUC we don't have it, and the first registered driver is used when a new device is registered. I was thinking if it makes sense to extend the management API to specify which bus driver to use for a device. A use case could be for example a single host handling VMs and bare-metal containers, so we would have both virtio-vdpa and vhost-vdpa loaded and we want to attach some devices to VMs through vhost-vdpa and others to containers through virtio-vdpa. What do you think? I can prepare an RFC with some code, the idea is to use the .match callback of "struct bus_type" to use a driver instead of the other, and extend netlink API to specify the vDPA bus driver name to use. Thanks, Stefano ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next] net: virtio: use eth_hw_addr_set()
On Wed, Oct 27, 2021 at 06:26:40AM -0700, Jakub Kicinski wrote: > On Wed, 27 Oct 2021 03:23:17 -0400 Michael S. Tsirkin wrote: > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > index c501b5974aee..b7f35aff8e82 100644 > > > --- a/drivers/net/virtio_net.c > > > +++ b/drivers/net/virtio_net.c > > > @@ -3177,12 +3177,16 @@ static int virtnet_probe(struct virtio_device > > > *vdev) > > > dev->max_mtu = MAX_MTU; > > > > > > /* Configuration may specify what MAC to use. Otherwise random. */ > > > - if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC)) > > > + if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC)) { > > > + u8 addr[MAX_ADDR_LEN]; > > > + > > > virtio_cread_bytes(vdev, > > > offsetof(struct virtio_net_config, mac), > > > -dev->dev_addr, dev->addr_len); > > > - else > > > +addr, dev->addr_len); > > > > Maybe BUG_ON(dev->addr_len > sizeof addr); > > > > here just to make sure we don't overflow addr silently? > > Since I need to post a v2 and we're talking... can I just use > eth_hw_addr_set() instead? AFAICT netdev is always allocated with > alloc_etherdev_mq() and ->addr_len never changed. Plus there is > a number of Ethernet address helpers used so ETH_ALEN is kind of > already assumed. Right. Sure, just document this in the commit log pls. > > > + dev_addr_set(dev, addr); > > > + } else { > > > eth_hw_addr_random(dev); > > > + } > > > > > > /* Set up our device-specific information */ > > > vi = netdev_priv(dev); ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
RE: vDPA bus driver selection
Hi Stefano, > From: Stefano Garzarella > Sent: Wednesday, October 27, 2021 8:04 PM > > Hi folks, > I was trying to understand if we have a way to specify which vDPA bus driver > (e.g. vhost-vdpa, virtio-vdpa) a device should use. > IIUC we don't have it, and the first registered driver is used when a new > device > is registered. > > I was thinking if it makes sense to extend the management API to specify which > bus driver to use for a device. A use case could be for example a single host > handling VMs and bare-metal containers, so we would have both virtio-vdpa > and vhost-vdpa loaded and we want to attach some devices to VMs through > vhost-vdpa and others to containers through virtio-vdpa. > > What do you think? > One option is, user keeps the drivers_autoprobe disabled for the vdpa bus using, $ vdpa/vdpa dev add mgmtdev vdpasim_net name vdpa0 mac 00:11:22:33:44:55 $ echo 0 > /sys/bus/vdpa/drivers_autoprobe And after vdpa device creation, it manually binds to the desired driver such as, $ echo vdpa0 > /sys/bus/vdpa/drivers/virtio_vdpa/bind Or $ echo vdpa0 > /sys/bus/vdpa/drivers/vhost_vdpa/bind In an case of VDUSE, it makes more sense to bind to the one of the above driver after user space has connected the use space backend to the kernel device. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next v2] net: virtio: use eth_hw_addr_set()
On Wed, Oct 27, 2021 at 08:20:12AM -0700, Jakub Kicinski wrote: > Commit 406f42fa0d3c ("net-next: When a bond have a massive amount > of VLANs...") introduced a rbtree for faster Ethernet address look > up. To maintain netdev->dev_addr in this tree we need to make all > the writes to it go through appropriate helpers. > > Even though the current code uses dev->addr_len the we can switch > to eth_hw_addr_set() instead of dev_addr_set(). The netdev is > always allocated by alloc_etherdev_mq() and there are at least two > places which assume Ethernet address: > - the line below calling eth_hw_addr_random() > - virtnet_set_mac_address() -> eth_commit_mac_addr_change() > > Signed-off-by: Jakub Kicinski Acked-by: Michael S. Tsirkin > --- > v2: - actually switch to eth_hw_addr_set() not dev_addr_set() > - resize the buffer to ETH_ALEN > - pass ETH_ALEN instead of dev->dev_addr to virtio_cread_bytes() > > CC: m...@redhat.com > CC: jasow...@redhat.com > CC: virtualization@lists.linux-foundation.org > --- > drivers/net/virtio_net.c | 10 +++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index c501b5974aee..cc79343cd220 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -3177,12 +3177,16 @@ static int virtnet_probe(struct virtio_device *vdev) > dev->max_mtu = MAX_MTU; > > /* Configuration may specify what MAC to use. Otherwise random. */ > - if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC)) > + if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC)) { > + u8 addr[ETH_ALEN]; > + > virtio_cread_bytes(vdev, > offsetof(struct virtio_net_config, mac), > -dev->dev_addr, dev->addr_len); > - else > +addr, ETH_ALEN); > + eth_hw_addr_set(dev, addr); > + } else { > eth_hw_addr_random(dev); > + } > > /* Set up our device-specific information */ > vi = netdev_priv(dev); > -- > 2.31.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: vDPA bus driver selection
Hi Parav, On Wed, Oct 27, 2021 at 03:21:15PM +, Parav Pandit wrote: Hi Stefano, From: Stefano Garzarella Sent: Wednesday, October 27, 2021 8:04 PM Hi folks, I was trying to understand if we have a way to specify which vDPA bus driver (e.g. vhost-vdpa, virtio-vdpa) a device should use. IIUC we don't have it, and the first registered driver is used when a new device is registered. I was thinking if it makes sense to extend the management API to specify which bus driver to use for a device. A use case could be for example a single host handling VMs and bare-metal containers, so we would have both virtio-vdpa and vhost-vdpa loaded and we want to attach some devices to VMs through vhost-vdpa and others to containers through virtio-vdpa. What do you think? One option is, user keeps the drivers_autoprobe disabled for the vdpa bus using, $ vdpa/vdpa dev add mgmtdev vdpasim_net name vdpa0 mac 00:11:22:33:44:55 $ echo 0 > /sys/bus/vdpa/drivers_autoprobe And after vdpa device creation, it manually binds to the desired driver such as, $ echo vdpa0 > /sys/bus/vdpa/drivers/virtio_vdpa/bind Or $ echo vdpa0 > /sys/bus/vdpa/drivers/vhost_vdpa/bind Cool, I didn't know that. This is very useful, but do you think it might be better to integrate it with the netlink API and specify at creation which bus driver to use? In an case of VDUSE, it makes more sense to bind to the one of the above driver after user space has connected the use space backend to the kernel device. Yep, make sense. Thanks, Stefano ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
RE: vDPA bus driver selection
Hi Stefano, > From: Stefano Garzarella > Sent: Wednesday, October 27, 2021 9:17 PM > To: Parav Pandit > Cc: Jason Wang ; Michael Tsirkin ; > Linux Virtualization ; Eli Cohen > > Subject: Re: vDPA bus driver selection > > Hi Parav, > > On Wed, Oct 27, 2021 at 03:21:15PM +, Parav Pandit wrote: > >Hi Stefano, > > > >> From: Stefano Garzarella > >> Sent: Wednesday, October 27, 2021 8:04 PM > >> > >> Hi folks, > >> I was trying to understand if we have a way to specify which vDPA bus > >> driver (e.g. vhost-vdpa, virtio-vdpa) a device should use. > >> IIUC we don't have it, and the first registered driver is used when a > >> new device is registered. > >> > >> I was thinking if it makes sense to extend the management API to > >> specify which bus driver to use for a device. A use case could be for > >> example a single host handling VMs and bare-metal containers, so we > >> would have both virtio-vdpa and vhost-vdpa loaded and we want to > >> attach some devices to VMs through vhost-vdpa and others to containers > through virtio-vdpa. > >> > >> What do you think? > >> > >One option is, user keeps the drivers_autoprobe disabled for the vdpa > >bus using, > > > >$ vdpa/vdpa dev add mgmtdev vdpasim_net name vdpa0 mac > >00:11:22:33:44:55 $ echo 0 > /sys/bus/vdpa/drivers_autoprobe > > > >And after vdpa device creation, it manually binds to the desired driver > >such as, > > > >$ echo vdpa0 > /sys/bus/vdpa/drivers/virtio_vdpa/bind > >Or > >$ echo vdpa0 > /sys/bus/vdpa/drivers/vhost_vdpa/bind > > Cool, I didn't know that. This is very useful, but do you think it might be > better > to integrate it with the netlink API and specify at creation which bus driver > to > use? I think it is useful; for vduse case we need the ability to say "none" as well and when nothing specified it should be default driver. More than netlink, I think we need the ability in the core kernel to make this choice. I haven't explored what is already available to make that happen. If/once its available, I am sure it has more users than just vdpa. > > > > >In an case of VDUSE, it makes more sense to bind to the one of the > >above driver after user space has connected the use space backend to > >the kernel device. > > Yep, make sense. > > Thanks, > Stefano ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/3] virtio: cache indirect desc for split
On 10/26/21 11:19 PM, Xuan Zhuo wrote: > In the case of using indirect, indirect desc must be allocated and > released each time, which increases a lot of cpu overhead. > > Here, a cache is added for indirect. If the number of indirect desc to be > applied for is less than VIRT_QUEUE_CACHE_DESC_NUM, the desc array with > the size of VIRT_QUEUE_CACHE_DESC_NUM is fixed and cached for reuse. > > Signed-off-by: Xuan Zhuo > --- > drivers/virtio/virtio.c | 6 > drivers/virtio/virtio_ring.c | 63 ++-- > include/linux/virtio.h | 10 ++ > 3 files changed, 70 insertions(+), 9 deletions(-) > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c > index 0a5b54034d4b..04bcb74e5b9a 100644 > --- a/drivers/virtio/virtio.c > +++ b/drivers/virtio/virtio.c > @@ -431,6 +431,12 @@ bool is_virtio_device(struct device *dev) > } > EXPORT_SYMBOL_GPL(is_virtio_device); > > +void virtio_use_desc_cache(struct virtio_device *dev, bool val) > +{ > + dev->desc_cache = val; > +} > +EXPORT_SYMBOL_GPL(virtio_use_desc_cache); > + > void unregister_virtio_device(struct virtio_device *dev) > { > int index = dev->index; /* save for after device release */ > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c > index dd95dfd85e98..0b9a8544b0e8 100644 > --- a/drivers/virtio/virtio_ring.c > +++ b/drivers/virtio/virtio_ring.c > @@ -117,6 +117,10 @@ struct vring_virtqueue { > /* Hint for event idx: already triggered no need to disable. */ > bool event_triggered; > > + /* Is indirect cache used? */ > + bool use_desc_cache; > + void *desc_cache_chain; > + > union { > /* Available for split ring */ > struct { > @@ -423,12 +427,47 @@ static unsigned int vring_unmap_one_split(const struct > vring_virtqueue *vq, > return extra[i].next; > } > > -static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq, > +#define VIRT_QUEUE_CACHE_DESC_NUM 4 > + > +static void desc_cache_chain_free_split(void *chain) > +{ > + struct vring_desc *desc; > + > + while (chain) { > + desc = chain; > + chain = (void *)desc->addr; > + kfree(desc); > + } > +} > + > +static void desc_cache_put_split(struct vring_virtqueue *vq, > + struct vring_desc *desc, int n) > +{ > + if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) { > + desc->addr = (u64)vq->desc_cache_chain; > + vq->desc_cache_chain = desc; > + } else { > + kfree(desc); > + } > +} > + > +static struct vring_desc *alloc_indirect_split(struct vring_virtqueue *vq, > unsigned int total_sg, > gfp_t gfp) > { > struct vring_desc *desc; > - unsigned int i; > + unsigned int i, n; > + > + if (vq->use_desc_cache && total_sg <= VIRT_QUEUE_CACHE_DESC_NUM) { > + if (vq->desc_cache_chain) { > + desc = vq->desc_cache_chain; > + vq->desc_cache_chain = (void *)desc->addr; > + goto got; > + } > + n = VIRT_QUEUE_CACHE_DESC_NUM; How about to make the VIRT_QUEUE_CACHE_DESC_NUM configurable (at least during driver probing) unless there is a reason that the default value is 4. Thank you very much! Dongli Zhang > + } else { > + n = total_sg; > + } > > /* >* We require lowmem mappings for the descriptors because > @@ -437,12 +476,13 @@ static struct vring_desc *alloc_indirect_split(struct > virtqueue *_vq, >*/ > gfp &= ~__GFP_HIGHMEM; > > - desc = kmalloc_array(total_sg, sizeof(struct vring_desc), gfp); > + desc = kmalloc_array(n, sizeof(struct vring_desc), gfp); > if (!desc) > return NULL; > > +got: > for (i = 0; i < total_sg; i++) > - desc[i].next = cpu_to_virtio16(_vq->vdev, i + 1); > + desc[i].next = cpu_to_virtio16(vq->vq.vdev, i + 1); > return desc; > } > > @@ -508,7 +548,7 @@ static inline int virtqueue_add_split(struct virtqueue > *_vq, > head = vq->free_head; > > if (virtqueue_use_indirect(_vq, total_sg)) > - desc = alloc_indirect_split(_vq, total_sg, gfp); > + desc = alloc_indirect_split(vq, total_sg, gfp); > else { > desc = NULL; > WARN_ON_ONCE(total_sg > vq->split.vring.num && !vq->indirect); > @@ -652,7 +692,7 @@ static inline int virtqueue_add_split(struct virtqueue > *_vq, > } > > if (indirect) > - kfree(desc); > + desc_cache_put_split(vq, desc, total_sg); > > END_USE(vq); > return -ENOMEM; > @@ -717,7 +757,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, > unsigned int head, > if (vq->indirect) { > struct vring_desc *indir_desc = >
Re: [PATCH 1/3] virtio: cache indirect desc for split
On Wed, Oct 27, 2021 at 02:19:11PM +0800, Xuan Zhuo wrote: > In the case of using indirect, indirect desc must be allocated and > released each time, which increases a lot of cpu overhead. > > Here, a cache is added for indirect. If the number of indirect desc to be > applied for is less than VIRT_QUEUE_CACHE_DESC_NUM, the desc array with > the size of VIRT_QUEUE_CACHE_DESC_NUM is fixed and cached for reuse. > > Signed-off-by: Xuan Zhuo > --- > drivers/virtio/virtio.c | 6 > drivers/virtio/virtio_ring.c | 63 ++-- > include/linux/virtio.h | 10 ++ > 3 files changed, 70 insertions(+), 9 deletions(-) > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c > index 0a5b54034d4b..04bcb74e5b9a 100644 > --- a/drivers/virtio/virtio.c > +++ b/drivers/virtio/virtio.c > @@ -431,6 +431,12 @@ bool is_virtio_device(struct device *dev) > } > EXPORT_SYMBOL_GPL(is_virtio_device); > > +void virtio_use_desc_cache(struct virtio_device *dev, bool val) > +{ > + dev->desc_cache = val; > +} > +EXPORT_SYMBOL_GPL(virtio_use_desc_cache); > + > void unregister_virtio_device(struct virtio_device *dev) > { > int index = dev->index; /* save for after device release */ > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c > index dd95dfd85e98..0b9a8544b0e8 100644 > --- a/drivers/virtio/virtio_ring.c > +++ b/drivers/virtio/virtio_ring.c > @@ -117,6 +117,10 @@ struct vring_virtqueue { > /* Hint for event idx: already triggered no need to disable. */ > bool event_triggered; > > + /* Is indirect cache used? */ > + bool use_desc_cache; > + void *desc_cache_chain; > + > union { > /* Available for split ring */ > struct { > @@ -423,12 +427,47 @@ static unsigned int vring_unmap_one_split(const struct > vring_virtqueue *vq, > return extra[i].next; > } > > -static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq, > +#define VIRT_QUEUE_CACHE_DESC_NUM 4 > + > +static void desc_cache_chain_free_split(void *chain) > +{ > + struct vring_desc *desc; > + > + while (chain) { > + desc = chain; > + chain = (void *)desc->addr; > + kfree(desc); > + } > +} > + > +static void desc_cache_put_split(struct vring_virtqueue *vq, > + struct vring_desc *desc, int n) > +{ > + if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) { > + desc->addr = (u64)vq->desc_cache_chain; > + vq->desc_cache_chain = desc; > + } else { > + kfree(desc); > + } > +} > + So I have a question here. What happens if we just do: if (n <= VIRT_QUEUE_CACHE_DESC_NUM) { return kmem_cache_alloc(VIRT_QUEUE_CACHE_DESC_NUM * sizeof desc, gfp) } else { return kmalloc_arrat(n, sizeof desc, gfp) } A small change and won't we reap most performance benefits? -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: vDPA bus driver selection
On Wed, Oct 27, 2021 at 04:33:50PM +0200, Stefano Garzarella wrote: > Hi folks, > I was trying to understand if we have a way to specify which vDPA bus > driver (e.g. vhost-vdpa, virtio-vdpa) a device should use. > IIUC we don't have it, and the first registered driver is used when a > new device is registered. > > I was thinking if it makes sense to extend the management API to > specify which bus driver to use for a device. A use case could be for > example a single host handling VMs and bare-metal containers, so we > would have both virtio-vdpa and vhost-vdpa loaded and we want to > attach some devices to VMs through vhost-vdpa and others to containers > through virtio-vdpa. > > What do you think? > > I can prepare an RFC with some code, the idea is to use the .match > callback of "struct bus_type" to use a driver instead of the other, > and extend netlink API to specify the vDPA bus driver name to use. > > Thanks, > Stefano So I think that doing this at create time is somewhat limited. For example a great way to do migration could be to unbind device from VM then bind virtio on the host to it, then bind macvtap to that. Ideas on how to allow that? -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/3] virtio: cache indirect desc for split
On Wed, Oct 27, 2021 at 09:33:46AM -0700, Dongli Zhang wrote: > > > On 10/26/21 11:19 PM, Xuan Zhuo wrote: > > In the case of using indirect, indirect desc must be allocated and > > released each time, which increases a lot of cpu overhead. > > > > Here, a cache is added for indirect. If the number of indirect desc to be > > applied for is less than VIRT_QUEUE_CACHE_DESC_NUM, the desc array with > > the size of VIRT_QUEUE_CACHE_DESC_NUM is fixed and cached for reuse. > > > > Signed-off-by: Xuan Zhuo > > --- > > drivers/virtio/virtio.c | 6 > > drivers/virtio/virtio_ring.c | 63 ++-- > > include/linux/virtio.h | 10 ++ > > 3 files changed, 70 insertions(+), 9 deletions(-) > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c > > index 0a5b54034d4b..04bcb74e5b9a 100644 > > --- a/drivers/virtio/virtio.c > > +++ b/drivers/virtio/virtio.c > > @@ -431,6 +431,12 @@ bool is_virtio_device(struct device *dev) > > } > > EXPORT_SYMBOL_GPL(is_virtio_device); > > > > +void virtio_use_desc_cache(struct virtio_device *dev, bool val) > > +{ > > + dev->desc_cache = val; > > +} > > +EXPORT_SYMBOL_GPL(virtio_use_desc_cache); > > + > > void unregister_virtio_device(struct virtio_device *dev) > > { > > int index = dev->index; /* save for after device release */ > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c > > index dd95dfd85e98..0b9a8544b0e8 100644 > > --- a/drivers/virtio/virtio_ring.c > > +++ b/drivers/virtio/virtio_ring.c > > @@ -117,6 +117,10 @@ struct vring_virtqueue { > > /* Hint for event idx: already triggered no need to disable. */ > > bool event_triggered; > > > > + /* Is indirect cache used? */ > > + bool use_desc_cache; > > + void *desc_cache_chain; > > + > > union { > > /* Available for split ring */ > > struct { > > @@ -423,12 +427,47 @@ static unsigned int vring_unmap_one_split(const > > struct vring_virtqueue *vq, > > return extra[i].next; > > } > > > > -static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq, > > +#define VIRT_QUEUE_CACHE_DESC_NUM 4 > > + > > +static void desc_cache_chain_free_split(void *chain) > > +{ > > + struct vring_desc *desc; > > + > > + while (chain) { > > + desc = chain; > > + chain = (void *)desc->addr; > > + kfree(desc); > > + } > > +} > > + > > +static void desc_cache_put_split(struct vring_virtqueue *vq, > > +struct vring_desc *desc, int n) > > +{ > > + if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) { > > + desc->addr = (u64)vq->desc_cache_chain; > > + vq->desc_cache_chain = desc; > > + } else { > > + kfree(desc); > > + } > > +} > > + > > +static struct vring_desc *alloc_indirect_split(struct vring_virtqueue *vq, > >unsigned int total_sg, > >gfp_t gfp) > > { > > struct vring_desc *desc; > > - unsigned int i; > > + unsigned int i, n; > > + > > + if (vq->use_desc_cache && total_sg <= VIRT_QUEUE_CACHE_DESC_NUM) { > > + if (vq->desc_cache_chain) { > > + desc = vq->desc_cache_chain; > > + vq->desc_cache_chain = (void *)desc->addr; > > + goto got; > > + } > > + n = VIRT_QUEUE_CACHE_DESC_NUM; > > How about to make the VIRT_QUEUE_CACHE_DESC_NUM configurable (at least during > driver probing) unless there is a reason that the default value is 4. > > Thank you very much! > > Dongli Zhang I would start with some experimentation showing that it actually makes a difference in performance. > > > > + } else { > > + n = total_sg; > > + } > > > > /* > > * We require lowmem mappings for the descriptors because > > @@ -437,12 +476,13 @@ static struct vring_desc *alloc_indirect_split(struct > > virtqueue *_vq, > > */ > > gfp &= ~__GFP_HIGHMEM; > > > > - desc = kmalloc_array(total_sg, sizeof(struct vring_desc), gfp); > > + desc = kmalloc_array(n, sizeof(struct vring_desc), gfp); > > if (!desc) > > return NULL; > > > > +got: > > for (i = 0; i < total_sg; i++) > > - desc[i].next = cpu_to_virtio16(_vq->vdev, i + 1); > > + desc[i].next = cpu_to_virtio16(vq->vq.vdev, i + 1); > > return desc; > > } > > > > @@ -508,7 +548,7 @@ static inline int virtqueue_add_split(struct virtqueue > > *_vq, > > head = vq->free_head; > > > > if (virtqueue_use_indirect(_vq, total_sg)) > > - desc = alloc_indirect_split(_vq, total_sg, gfp); > > + desc = alloc_indirect_split(vq, total_sg, gfp); > > else { > > desc = NULL; > > WARN_ON_ONCE(total_sg > vq->split.vring.num && !vq->indirect); > > @@ -652,7 +692,7 @@ static inline int virtqueue_add_split(struct virtqueue > > *_vq, > > } > > > > if (in
Re: [PATCH v3 4/4] virtio-blk: Use blk_validate_block_size() to validate block size
On Tue, Oct 26, 2021 at 10:40:15PM +0800, Xie Yongji wrote: > The block layer can't support a block size larger than > page size yet. And a block size that's too small or > not a power of two won't work either. If a misconfigured > device presents an invalid block size in configuration space, > it will result in the kernel crash something like below: > > [ 506.154324] BUG: kernel NULL pointer dereference, address: 0008 > [ 506.160416] RIP: 0010:create_empty_buffers+0x24/0x100 > [ 506.174302] Call Trace: > [ 506.174651] create_page_buffers+0x4d/0x60 > [ 506.175207] block_read_full_page+0x50/0x380 > [ 506.175798] ? __mod_lruvec_page_state+0x60/0xa0 > [ 506.176412] ? __add_to_page_cache_locked+0x1b2/0x390 > [ 506.177085] ? blkdev_direct_IO+0x4a0/0x4a0 > [ 506.177644] ? scan_shadow_nodes+0x30/0x30 > [ 506.178206] ? lru_cache_add+0x42/0x60 > [ 506.178716] do_read_cache_page+0x695/0x740 > [ 506.179278] ? read_part_sector+0xe0/0xe0 > [ 506.179821] read_part_sector+0x36/0xe0 > [ 506.180337] adfspart_check_ICS+0x32/0x320 > [ 506.180890] ? snprintf+0x45/0x70 > [ 506.181350] ? read_part_sector+0xe0/0xe0 > [ 506.181906] bdev_disk_changed+0x229/0x5c0 > [ 506.182483] blkdev_get_whole+0x6d/0x90 > [ 506.183013] blkdev_get_by_dev+0x122/0x2d0 > [ 506.183562] device_add_disk+0x39e/0x3c0 > [ 506.184472] virtblk_probe+0x3f8/0x79b [virtio_blk] > [ 506.185461] virtio_dev_probe+0x15e/0x1d0 [virtio] > > So let's use a block layer helper to validate the block size. > > Signed-off-by: Xie Yongji Acked-by: Michael S. Tsirkin Please merge through the block tree because of the dependency. Jens can you pick this up? > --- > drivers/block/virtio_blk.c | 12 ++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c > index 303caf2d17d0..fd086179f980 100644 > --- a/drivers/block/virtio_blk.c > +++ b/drivers/block/virtio_blk.c > @@ -815,9 +815,17 @@ static int virtblk_probe(struct virtio_device *vdev) > err = virtio_cread_feature(vdev, VIRTIO_BLK_F_BLK_SIZE, > struct virtio_blk_config, blk_size, > &blk_size); > - if (!err) > + if (!err) { > + err = blk_validate_block_size(blk_size); > + if (err) { > + dev_err(&vdev->dev, > + "virtio_blk: invalid block size: 0x%x\n", > + blk_size); > + goto out_cleanup_disk; > + } > + > blk_queue_logical_block_size(q, blk_size); > - else > + } else > blk_size = queue_logical_block_size(q); > > /* Use topology information if available */ > -- > 2.11.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] vhost: Make use of the helper macro kthread_run()
On Thu, Oct 21, 2021 at 04:44:06PM +0800, Cai Huoqing wrote: > Repalce kthread_create/wake_up_process() with kthread_run() > to simplify the code. > > Signed-off-by: Cai Huoqing Pls check how this interacts with Mike Christie's patches. Pls fix up the typo in the commit log. > --- > drivers/vhost/vhost.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > index 59edb5a1ffe2..e67bd5603b5f 100644 > --- a/drivers/vhost/vhost.c > +++ b/drivers/vhost/vhost.c > @@ -595,15 +595,15 @@ long vhost_dev_set_owner(struct vhost_dev *dev) > > dev->kcov_handle = kcov_common_handle(); > if (dev->use_worker) { > - worker = kthread_create(vhost_worker, dev, > - "vhost-%d", current->pid); > + /* avoid contributing to loadavg */ doesn't this comment have any value anymore? > + worker = kthread_run(vhost_worker, dev, > + "vhost-%d", current->pid); > if (IS_ERR(worker)) { > err = PTR_ERR(worker); > goto err_worker; > } > > dev->worker = worker; > - wake_up_process(worker); /* avoid contributing to loadavg */ > > err = vhost_attach_cgroups(dev); > if (err) > -- > 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[GIT PULL] virtio: last minute fixes
The following changes since commit 64222515138e43da1fcf288f0289ef1020427b87: Merge tag 'drm-fixes-2021-10-22' of git://anongit.freedesktop.org/drm/drm (2021-10-21 19:06:08 -1000) are available in the Git repository at: https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus for you to fetch changes up to 890d33561337ffeba0d8ba42517e71288cfee2b6: virtio-ring: fix DMA metadata flags (2021-10-27 15:54:34 -0400) virtio: last minute fixes A couple of fixes that seem important enough to pick at the last moment. Signed-off-by: Michael S. Tsirkin Vincent Whitchurch (1): virtio-ring: fix DMA metadata flags Xie Yongji (2): vduse: Disallow injecting interrupt before DRIVER_OK is set vduse: Fix race condition between resetting and irq injecting drivers/vdpa/vdpa_user/vduse_dev.c | 29 + drivers/virtio/virtio_ring.c | 2 +- 2 files changed, 26 insertions(+), 5 deletions(-) ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] vhost: Make use of the helper macro kthread_run()
On 10/27/21 3:02 PM, Michael S. Tsirkin wrote: > On Thu, Oct 21, 2021 at 04:44:06PM +0800, Cai Huoqing wrote: >> Repalce kthread_create/wake_up_process() with kthread_run() >> to simplify the code. >> >> Signed-off-by: Cai Huoqing > > Pls check how this interacts with Mike Christie's patches. > Pls fix up the typo in the commit log. > Hi Cai, We probably don't need this patch since it's an API cleanup and not fixing a bug. I'm replacing this code with the kernel_worker API with this patch https://lore.kernel.org/all/20211007214448.6282-9-michael.chris...@oracle.com/ in this patchset; https://lore.kernel.org/all/20211007214448.6282-1-michael.chris...@oracle.com/ so the issue of using kthread_create + wake_up_process will be gone shortly either way. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 0/4] Add blk_validate_block_size() helper for drivers to validate block size
On Tue, 26 Oct 2021 22:40:11 +0800, Xie Yongji wrote: > The block layer can't support the block size larger than > page size yet, so driver needs to validate the block size > before setting it. Now this validation is done in device drivers > with some duplicated codes. This series tries to add a block > layer helper for it and makes loop driver, nbd driver and > virtio-blk driver use it. > > [...] Applied, thanks! [1/4] block: Add a helper to validate the block size commit: 570b1cac477643cbf01a45fa5d018430a1fddbce [2/4] nbd: Use blk_validate_block_size() to validate block size commit: c4318d6cd038472d13e08a54a9035863c8c160bd [3/4] loop: Use blk_validate_block_size() to validate block size commit: af3c570fb0df422b4906ebd11c1bf363d89961d5 [4/4] virtio-blk: Use blk_validate_block_size() to validate block size commit: 57a13a5b8157d9a8606490aaa1b805bafe6c37e1 Best regards, -- Jens Axboe ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: vDPA bus driver selection
On Wed, Oct 27, 2021 at 03:21:15PM +, Parav Pandit wrote: > Hi Stefano, > > > From: Stefano Garzarella > > Sent: Wednesday, October 27, 2021 8:04 PM > > > > Hi folks, > > I was trying to understand if we have a way to specify which vDPA bus driver > > (e.g. vhost-vdpa, virtio-vdpa) a device should use. > > IIUC we don't have it, and the first registered driver is used when a new > > device > > is registered. > > > > I was thinking if it makes sense to extend the management API to specify > > which > > bus driver to use for a device. A use case could be for example a single > > host > > handling VMs and bare-metal containers, so we would have both virtio-vdpa > > and vhost-vdpa loaded and we want to attach some devices to VMs through > > vhost-vdpa and others to containers through virtio-vdpa. > > > > What do you think? > > > One option is, user keeps the drivers_autoprobe disabled for the vdpa bus > using, > > $ vdpa/vdpa dev add mgmtdev vdpasim_net name vdpa0 mac 00:11:22:33:44:55 > $ echo 0 > /sys/bus/vdpa/drivers_autoprobe > > And after vdpa device creation, it manually binds to the desired driver such > as, > > $ echo vdpa0 > /sys/bus/vdpa/drivers/virtio_vdpa/bind > Or > $ echo vdpa0 > /sys/bus/vdpa/drivers/vhost_vdpa/bind > > In an case of VDUSE, it makes more sense to bind to the one of the above > driver after user space has connected the use space backend to the kernel > device. The only annoying thing is that manual bind is not validated. E.g. if one makes a mistake and binds an incorrect device, it just tends to crash IIRC. Another is that it all needs to be root. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: futher decouple DAX from block devices
[ add sfr ] On Sun, Oct 17, 2021 at 9:41 PM Christoph Hellwig wrote: > > Hi Dan, > > this series cleans up and simplifies the association between DAX and block > devices in preparation of allowing to mount file systems directly on DAX > devices without a detour through block devices. So I notice that this is based on linux-next while libnvdimm-for-next is based on v5.15-rc4. Since I'm not Andrew I went ahead and rebased these onto v5.15-rc4, tested that, and then merged with linux-next to resolve the conflicts and tested again. My merge resolution is here [1]. Christoph, please have a look. The rebase and the merge result are both passing my test and I'm now going to review the individual patches. However, while I do that and collect acks from DM and EROFS folks, I want to give Stephen a heads up that this is coming. Primarily I want to see if someone sees a better strategy to merge this, please let me know, but if not I plan to walk Stephen and Linus through the resolution. [1]: https://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm.git/commit/?id=c3894cf6eb8f > > Diffstat: > drivers/dax/Kconfig |4 > drivers/dax/bus.c|2 > drivers/dax/super.c | 220 > +-- > drivers/md/dm-linear.c | 51 +++-- > drivers/md/dm-log-writes.c | 44 +++- > drivers/md/dm-stripe.c | 65 +++- > drivers/md/dm-table.c| 22 ++-- > drivers/md/dm-writecache.c |2 > drivers/md/dm.c | 29 - > drivers/md/dm.h |4 > drivers/nvdimm/Kconfig |2 > drivers/nvdimm/pmem.c|9 - > drivers/s390/block/Kconfig |2 > drivers/s390/block/dcssblk.c | 12 +- > fs/dax.c | 13 ++ > fs/erofs/super.c | 11 +- > fs/ext2/super.c |6 - > fs/ext4/super.c |9 + > fs/fuse/Kconfig |2 > fs/fuse/virtio_fs.c |2 > fs/xfs/xfs_super.c | 54 +- > include/linux/dax.h | 30 ++--- > 22 files changed, 185 insertions(+), 410 deletions(-) ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 01/11] dm: make the DAX support dependend on CONFIG_FS_DAX
On Sun, Oct 17, 2021 at 9:41 PM Christoph Hellwig wrote: > > The device mapper DAX support is all hanging off a block device and thus > can't be used with device dax. Make it depend on CONFIG_FS_DAX instead > of CONFIG_DAX_DRIVER. This also means that bdev_dax_pgoff only needs to > be built under CONFIG_FS_DAX now. Looks good. Mike, can I get an ack to take this through nvdimm.git? (you'll likely see me repeat this question on subsequent patches in this series). > > Signed-off-by: Christoph Hellwig > --- > drivers/dax/super.c| 6 ++ > drivers/md/dm-linear.c | 2 +- > drivers/md/dm-log-writes.c | 2 +- > drivers/md/dm-stripe.c | 2 +- > drivers/md/dm-writecache.c | 2 +- > drivers/md/dm.c| 2 +- > 6 files changed, 7 insertions(+), 9 deletions(-) > > diff --git a/drivers/dax/super.c b/drivers/dax/super.c > index b882cf8106ea3..e20d0cef10a18 100644 > --- a/drivers/dax/super.c > +++ b/drivers/dax/super.c > @@ -63,7 +63,7 @@ static int dax_host_hash(const char *host) > return hashlen_hash(hashlen_string("DAX", host)) % DAX_HASH_SIZE; > } > > -#ifdef CONFIG_BLOCK > +#if defined(CONFIG_BLOCK) && defined(CONFIG_FS_DAX) > #include > > int bdev_dax_pgoff(struct block_device *bdev, sector_t sector, size_t size, > @@ -80,7 +80,6 @@ int bdev_dax_pgoff(struct block_device *bdev, sector_t > sector, size_t size, > } > EXPORT_SYMBOL(bdev_dax_pgoff); > > -#if IS_ENABLED(CONFIG_FS_DAX) > /** > * dax_get_by_host() - temporary lookup mechanism for filesystem-dax > * @host: alternate name for the device registered by a dax driver > @@ -219,8 +218,7 @@ bool dax_supported(struct dax_device *dax_dev, struct > block_device *bdev, > return ret; > } > EXPORT_SYMBOL_GPL(dax_supported); > -#endif /* CONFIG_FS_DAX */ > -#endif /* CONFIG_BLOCK */ > +#endif /* CONFIG_BLOCK && CONFIG_FS_DAX */ > > enum dax_device_flags { > /* !alive + rcu grace period == no new operations / mappings */ > diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c > index 679b4c0a2eea1..32fbab11bf90c 100644 > --- a/drivers/md/dm-linear.c > +++ b/drivers/md/dm-linear.c > @@ -163,7 +163,7 @@ static int linear_iterate_devices(struct dm_target *ti, > return fn(ti, lc->dev, lc->start, ti->len, data); > } > > -#if IS_ENABLED(CONFIG_DAX_DRIVER) > +#if IS_ENABLED(CONFIG_FS_DAX) > static long linear_dax_direct_access(struct dm_target *ti, pgoff_t pgoff, > long nr_pages, void **kaddr, pfn_t *pfn) > { > diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c > index d93a4db235124..6d694526881d0 100644 > --- a/drivers/md/dm-log-writes.c > +++ b/drivers/md/dm-log-writes.c > @@ -903,7 +903,7 @@ static void log_writes_io_hints(struct dm_target *ti, > struct queue_limits *limit > limits->io_min = limits->physical_block_size; > } > > -#if IS_ENABLED(CONFIG_DAX_DRIVER) > +#if IS_ENABLED(CONFIG_FS_DAX) > static int log_dax(struct log_writes_c *lc, sector_t sector, size_t bytes, >struct iov_iter *i) > { > diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c > index 6660b6b53d5bf..f084607220293 100644 > --- a/drivers/md/dm-stripe.c > +++ b/drivers/md/dm-stripe.c > @@ -300,7 +300,7 @@ static int stripe_map(struct dm_target *ti, struct bio > *bio) > return DM_MAPIO_REMAPPED; > } > > -#if IS_ENABLED(CONFIG_DAX_DRIVER) > +#if IS_ENABLED(CONFIG_FS_DAX) > static long stripe_dax_direct_access(struct dm_target *ti, pgoff_t pgoff, > long nr_pages, void **kaddr, pfn_t *pfn) > { > diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c > index 18320444fb0a9..4c3a6e33604d3 100644 > --- a/drivers/md/dm-writecache.c > +++ b/drivers/md/dm-writecache.c > @@ -38,7 +38,7 @@ > #define BITMAP_GRANULARITY PAGE_SIZE > #endif > > -#if IS_ENABLED(CONFIG_ARCH_HAS_PMEM_API) && IS_ENABLED(CONFIG_DAX_DRIVER) > +#if IS_ENABLED(CONFIG_ARCH_HAS_PMEM_API) && IS_ENABLED(CONFIG_FS_DAX) > #define DM_WRITECACHE_HAS_PMEM > #endif > > diff --git a/drivers/md/dm.c b/drivers/md/dm.c > index 7870e6460633f..79737aee516b1 100644 > --- a/drivers/md/dm.c > +++ b/drivers/md/dm.c > @@ -1783,7 +1783,7 @@ static struct mapped_device *alloc_dev(int minor) > md->disk->private_data = md; > sprintf(md->disk->disk_name, "dm-%d", minor); > > - if (IS_ENABLED(CONFIG_DAX_DRIVER)) { > + if (IS_ENABLED(CONFIG_FS_DAX)) { > md->dax_dev = alloc_dax(md, md->disk->disk_name, > &dm_dax_ops, 0); > if (IS_ERR(md->dax_dev)) > -- > 2.30.2 > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 02/11] dax: remove CONFIG_DAX_DRIVER
On Sun, Oct 17, 2021 at 9:41 PM Christoph Hellwig wrote: > > CONFIG_DAX_DRIVER only selects CONFIG_DAX now, so remove it. Looks good, I don't think an s390 ack is needed for this one. > Signed-off-by: Christoph Hellwig > --- > drivers/dax/Kconfig| 4 > drivers/nvdimm/Kconfig | 2 +- > drivers/s390/block/Kconfig | 2 +- > fs/fuse/Kconfig| 2 +- > 4 files changed, 3 insertions(+), 7 deletions(-) > > diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig > index d2834c2cfa10d..954ab14ba7778 100644 > --- a/drivers/dax/Kconfig > +++ b/drivers/dax/Kconfig > @@ -1,8 +1,4 @@ > # SPDX-License-Identifier: GPL-2.0-only > -config DAX_DRIVER > - select DAX > - bool > - > menuconfig DAX > tristate "DAX: direct access to differentiated memory" > select SRCU > diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig > index b7d1eb38b27d4..347fe7afa5830 100644 > --- a/drivers/nvdimm/Kconfig > +++ b/drivers/nvdimm/Kconfig > @@ -22,7 +22,7 @@ if LIBNVDIMM > config BLK_DEV_PMEM > tristate "PMEM: Persistent memory block device support" > default LIBNVDIMM > - select DAX_DRIVER > + select DAX > select ND_BTT if BTT > select ND_PFN if NVDIMM_PFN > help > diff --git a/drivers/s390/block/Kconfig b/drivers/s390/block/Kconfig > index d0416dbd0cd81..e3710a762abae 100644 > --- a/drivers/s390/block/Kconfig > +++ b/drivers/s390/block/Kconfig > @@ -5,7 +5,7 @@ comment "S/390 block device drivers" > config DCSSBLK > def_tristate m > select FS_DAX_LIMITED > - select DAX_DRIVER > + select DAX > prompt "DCSSBLK support" > depends on S390 && BLOCK > help > diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig > index 40ce9a1c12e5d..038ed0b9aaa5d 100644 > --- a/fs/fuse/Kconfig > +++ b/fs/fuse/Kconfig > @@ -45,7 +45,7 @@ config FUSE_DAX > select INTERVAL_TREE > depends on VIRTIO_FS > depends on FS_DAX > - depends on DAX_DRIVER > + depends on DAX > help > This allows bypassing guest page cache and allows mapping host page > cache directly in guest address space. > -- > 2.30.2 > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 03/11] dax: simplify the dax_device <-> gendisk association
On Sun, Oct 17, 2021 at 9:41 PM Christoph Hellwig wrote: > > Replace the dax_host_hash with an xarray indexed by the pointer value > of the gendisk, and require explicitl calls from the block drivers that s/explicitl/explicitl/ I've fixed that up locally. > want to associate their gendisk with a dax_device. This looks good. 0day-robot is now chewing on it via the test merge with linux-next (libnvdimm-pending). ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 04/11] dax: remove the pgmap sanity checks in generic_fsdax_supported
On Sun, Oct 17, 2021 at 9:41 PM Christoph Hellwig wrote: > > Drivers that register a dax_dev should make sure it works, no need > to double check from the file system. > > Signed-off-by: Christoph Hellwig > --- > drivers/dax/super.c | 49 + > 1 file changed, 1 insertion(+), 48 deletions(-) > > diff --git a/drivers/dax/super.c b/drivers/dax/super.c > index 9383c11b21853..04fc680542e8d 100644 > --- a/drivers/dax/super.c > +++ b/drivers/dax/super.c > @@ -107,13 +107,9 @@ bool generic_fsdax_supported(struct dax_device *dax_dev, > struct block_device *bdev, int blocksize, sector_t start, > sector_t sectors) > { > - bool dax_enabled = false; > pgoff_t pgoff, pgoff_end; > - void *kaddr, *end_kaddr; > - pfn_t pfn, end_pfn; > sector_t last_page; > - long len, len2; > - int err, id; > + int err; > > if (blocksize != PAGE_SIZE) { > pr_info("%pg: error: unsupported blocksize for dax\n", bdev); > @@ -138,49 +134,6 @@ bool generic_fsdax_supported(struct dax_device *dax_dev, > return false; > } > > - id = dax_read_lock(); > - len = dax_direct_access(dax_dev, pgoff, 1, &kaddr, &pfn); > - len2 = dax_direct_access(dax_dev, pgoff_end, 1, &end_kaddr, &end_pfn); > - > - if (len < 1 || len2 < 1) { > - pr_info("%pg: error: dax access failed (%ld)\n", > - bdev, len < 1 ? len : len2); > - dax_read_unlock(id); > - return false; > - } > - > - if (IS_ENABLED(CONFIG_FS_DAX_LIMITED) && pfn_t_special(pfn)) { > - /* > -* An arch that has enabled the pmem api should also > -* have its drivers support pfn_t_devmap() > -* > -* This is a developer warning and should not trigger in > -* production. dax_flush() will crash since it depends > -* on being able to do (page_address(pfn_to_page())). > -*/ > - WARN_ON(IS_ENABLED(CONFIG_ARCH_HAS_PMEM_API)); > - dax_enabled = true; > - } else if (pfn_t_devmap(pfn) && pfn_t_devmap(end_pfn)) { > - struct dev_pagemap *pgmap, *end_pgmap; > - > - pgmap = get_dev_pagemap(pfn_t_to_pfn(pfn), NULL); > - end_pgmap = get_dev_pagemap(pfn_t_to_pfn(end_pfn), NULL); > - if (pgmap && pgmap == end_pgmap && pgmap->type == > MEMORY_DEVICE_FS_DAX > - && pfn_t_to_page(pfn)->pgmap == pgmap > - && pfn_t_to_page(end_pfn)->pgmap == pgmap > - && pfn_t_to_pfn(pfn) == PHYS_PFN(__pa(kaddr)) > - && pfn_t_to_pfn(end_pfn) == > PHYS_PFN(__pa(end_kaddr))) This is effectively a self-test for a regression that was found while manipulating the 'struct page' memmap metadata reservation for PMEM namespaces. I guess it's just serving as a security-blanket now and I should find a way to move it out to a self-test. I'll take a look. > - dax_enabled = true; > - put_dev_pagemap(pgmap); > - put_dev_pagemap(end_pgmap); > - > - } > - dax_read_unlock(id); > - > - if (!dax_enabled) { > - pr_info("%pg: error: dax support not enabled\n", bdev); > - return false; > - } > return true; > } > EXPORT_SYMBOL_GPL(generic_fsdax_supported); > -- > 2.30.2 > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/3] virtio: cache indirect desc for split
Hi Xuan, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on horms-ipvs/master] [also build test WARNING on linus/master v5.15-rc7 next-20211027] [cannot apply to mst-vhost/linux-next] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025 base: https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs.git master config: arm-defconfig (attached as .config) compiler: arm-linux-gnueabi-gcc (GCC) 11.2.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/8935e116c155fb7d484bad35a42b2ca98f75e384 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025 git checkout 8935e116c155fb7d484bad35a42b2ca98f75e384 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross ARCH=arm If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All warnings (new ones prefixed by >>): drivers/virtio/virtio_ring.c: In function 'desc_cache_chain_free_split': >> drivers/virtio/virtio_ring.c:438:25: warning: cast to pointer from integer >> of different size [-Wint-to-pointer-cast] 438 | chain = (void *)desc->addr; | ^ drivers/virtio/virtio_ring.c: In function 'desc_cache_put_split': >> drivers/virtio/virtio_ring.c:447:30: warning: cast from pointer to integer >> of different size [-Wpointer-to-int-cast] 447 | desc->addr = (u64)vq->desc_cache_chain; | ^ drivers/virtio/virtio_ring.c: In function 'alloc_indirect_split': drivers/virtio/virtio_ring.c:464:48: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] 464 | vq->desc_cache_chain = (void *)desc->addr; |^ vim +438 drivers/virtio/virtio_ring.c 431 432 static void desc_cache_chain_free_split(void *chain) 433 { 434 struct vring_desc *desc; 435 436 while (chain) { 437 desc = chain; > 438 chain = (void *)desc->addr; 439 kfree(desc); 440 } 441 } 442 443 static void desc_cache_put_split(struct vring_virtqueue *vq, 444 struct vring_desc *desc, int n) 445 { 446 if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) { > 447 desc->addr = (u64)vq->desc_cache_chain; 448 vq->desc_cache_chain = desc; 449 } else { 450 kfree(desc); 451 } 452 } 453 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org .config.gz Description: application/gzip ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 05/11] dax: move the partition alignment check into fs_dax_get_by_bdev
On Sun, Oct 17, 2021 at 9:41 PM Christoph Hellwig wrote: > > fs_dax_get_by_bdev is the primary interface to find a dax device for a > block device, so move the partition alignment check there instead of > wiring it up through ->dax_supported. Looks good. > > Signed-off-by: Christoph Hellwig > --- > drivers/dax/super.c | 23 ++- > 1 file changed, 6 insertions(+), 17 deletions(-) > > diff --git a/drivers/dax/super.c b/drivers/dax/super.c > index 04fc680542e8d..482fe775324a4 100644 > --- a/drivers/dax/super.c > +++ b/drivers/dax/super.c > @@ -93,6 +93,12 @@ struct dax_device *fs_dax_get_by_bdev(struct block_device > *bdev) > if (!blk_queue_dax(bdev->bd_disk->queue)) > return NULL; > > + if ((get_start_sect(bdev) * SECTOR_SIZE) % PAGE_SIZE || > + (bdev_nr_sectors(bdev) * SECTOR_SIZE) % PAGE_SIZE) { > + pr_info("%pg: error: unaligned partition for dax\n", bdev); > + return NULL; > + } > + > id = dax_read_lock(); > dax_dev = xa_load(&dax_hosts, (unsigned long)bdev->bd_disk); > if (!dax_dev || !dax_alive(dax_dev) || !igrab(&dax_dev->inode)) > @@ -107,10 +113,6 @@ bool generic_fsdax_supported(struct dax_device *dax_dev, > struct block_device *bdev, int blocksize, sector_t start, > sector_t sectors) > { > - pgoff_t pgoff, pgoff_end; > - sector_t last_page; > - int err; > - > if (blocksize != PAGE_SIZE) { > pr_info("%pg: error: unsupported blocksize for dax\n", bdev); > return false; > @@ -121,19 +123,6 @@ bool generic_fsdax_supported(struct dax_device *dax_dev, > return false; > } > > - err = bdev_dax_pgoff(bdev, start, PAGE_SIZE, &pgoff); > - if (err) { > - pr_info("%pg: error: unaligned partition for dax\n", bdev); > - return false; > - } > - > - last_page = PFN_DOWN((start + sectors - 1) * 512) * PAGE_SIZE / 512; > - err = bdev_dax_pgoff(bdev, last_page, PAGE_SIZE, &pgoff_end); > - if (err) { > - pr_info("%pg: error: unaligned partition for dax\n", bdev); > - return false; > - } > - > return true; > } > EXPORT_SYMBOL_GPL(generic_fsdax_supported); > -- > 2.30.2 > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 06/11] xfs: factor out a xfs_setup_dax helper
On Tue, Oct 19, 2021 at 12:24 AM Christoph Hellwig wrote: > > On Mon, Oct 18, 2021 at 09:43:51AM -0700, Darrick J. Wong wrote: > > > --- a/fs/xfs/xfs_super.c > > > +++ b/fs/xfs/xfs_super.c > > > @@ -339,6 +339,32 @@ xfs_buftarg_is_dax( > > > bdev_nr_sectors(bt->bt_bdev)); > > > } > > > > > > +static int > > > +xfs_setup_dax( > > > > /me wonders if this should be named xfs_setup_dax_always, since this > > doesn't handle the dax=inode mode? > > Sure, why not. I went ahead and made that change locally. > > > The only reason I bring that up is that Eric reminded me a while ago > > that we don't actually print any kind of EXPERIMENTAL warning for the > > auto-detection behavior. > > Yes, I actually noticed that as well when preparing this series. The rest looks good to me. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 07/11] dax: remove dax_capable
I am going to change the subject of this patch to: dax: remove ->dax_supported() On Sun, Oct 17, 2021 at 9:41 PM Christoph Hellwig wrote: > I'll add a bit more background to help others review this. The ->dax_supported() operation arranges for a stack of devices to each answer the question "is dax operational". That request routes to generic_fsdax_supported() at last level device and that attempted an actual dax_direct_access() call and did some sanity checks. However, those sanity checks can be validated in other ways and with those removed the only question to answer is "has each block device driver in the stack performed dax_add_host()". That can be validated without a dax_operation. So, just open code the block size and dax_dev == NULL checks in the callers, and delete ->dax_supported(). Mike, let me know if you have any concerns. > Just open code the block size and dax_dev == NULL checks in the callers. > > Signed-off-by: Christoph Hellwig > --- > drivers/dax/super.c | 36 > drivers/md/dm-table.c| 22 +++--- > drivers/md/dm.c | 21 - > drivers/md/dm.h | 4 > drivers/nvdimm/pmem.c| 1 - > drivers/s390/block/dcssblk.c | 1 - > fs/erofs/super.c | 11 +++ > fs/ext2/super.c | 6 -- > fs/ext4/super.c | 9 ++--- > fs/xfs/xfs_super.c | 21 - > include/linux/dax.h | 14 -- > 11 files changed, 36 insertions(+), 110 deletions(-) > > diff --git a/drivers/dax/super.c b/drivers/dax/super.c > index 482fe775324a4..803942586d1b6 100644 > --- a/drivers/dax/super.c > +++ b/drivers/dax/super.c > @@ -108,42 +108,6 @@ struct dax_device *fs_dax_get_by_bdev(struct > block_device *bdev) > return dax_dev; > } > EXPORT_SYMBOL_GPL(fs_dax_get_by_bdev); > - > -bool generic_fsdax_supported(struct dax_device *dax_dev, > - struct block_device *bdev, int blocksize, sector_t start, > - sector_t sectors) > -{ > - if (blocksize != PAGE_SIZE) { > - pr_info("%pg: error: unsupported blocksize for dax\n", bdev); > - return false; > - } > - > - if (!dax_dev) { > - pr_debug("%pg: error: dax unsupported by block device\n", > bdev); > - return false; > - } > - > - return true; > -} > -EXPORT_SYMBOL_GPL(generic_fsdax_supported); > - > -bool dax_supported(struct dax_device *dax_dev, struct block_device *bdev, > - int blocksize, sector_t start, sector_t len) > -{ > - bool ret = false; > - int id; > - > - if (!dax_dev) > - return false; > - > - id = dax_read_lock(); > - if (dax_alive(dax_dev) && dax_dev->ops->dax_supported) > - ret = dax_dev->ops->dax_supported(dax_dev, bdev, blocksize, > - start, len); > - dax_read_unlock(id); > - return ret; > -} > -EXPORT_SYMBOL_GPL(dax_supported); > #endif /* CONFIG_BLOCK && CONFIG_FS_DAX */ > > enum dax_device_flags { > diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c > index 1fa4d5582dca5..4ae671c2168ea 100644 > --- a/drivers/md/dm-table.c > +++ b/drivers/md/dm-table.c > @@ -807,12 +807,14 @@ void dm_table_set_type(struct dm_table *t, enum > dm_queue_mode type) > EXPORT_SYMBOL_GPL(dm_table_set_type); > > /* validate the dax capability of the target device span */ > -int device_not_dax_capable(struct dm_target *ti, struct dm_dev *dev, > +static int device_not_dax_capable(struct dm_target *ti, struct dm_dev *dev, > sector_t start, sector_t len, void *data) > { > - int blocksize = *(int *) data; > + if (dev->dax_dev) > + return false; > > - return !dax_supported(dev->dax_dev, dev->bdev, blocksize, start, len); > + pr_debug("%pg: error: dax unsupported by block device\n", dev->bdev); > + return true; > } > > /* Check devices support synchronous DAX */ > @@ -822,8 +824,8 @@ static int device_not_dax_synchronous_capable(struct > dm_target *ti, struct dm_de > return !dev->dax_dev || !dax_synchronous(dev->dax_dev); > } > > -bool dm_table_supports_dax(struct dm_table *t, > - iterate_devices_callout_fn iterate_fn, int > *blocksize) > +static bool dm_table_supports_dax(struct dm_table *t, > + iterate_devices_callout_fn iterate_fn) > { > struct dm_target *ti; > unsigned i; > @@ -836,7 +838,7 @@ bool dm_table_supports_dax(struct dm_table *t, > return false; > > if (!ti->type->iterate_devices || > - ti->type->iterate_devices(ti, iterate_fn, blocksize)) > + ti->type->iterate_devices(ti, iterate_fn, NULL)) > return false; > } > > @@ -863,7 +865,6 @@ static int dm
Re: [dm-devel] [PATCH 07/11] dax: remove dax_capable
On Tue, Oct 19, 2021 at 8:45 AM Darrick J. Wong wrote: > > On Mon, Oct 18, 2021 at 06:40:50AM +0200, Christoph Hellwig wrote: > > Just open code the block size and dax_dev == NULL checks in the callers. > > > > Signed-off-by: Christoph Hellwig > > --- > > drivers/dax/super.c | 36 > > drivers/md/dm-table.c| 22 +++--- > > drivers/md/dm.c | 21 - > > drivers/md/dm.h | 4 > > drivers/nvdimm/pmem.c| 1 - > > drivers/s390/block/dcssblk.c | 1 - > > fs/erofs/super.c | 11 +++ > > fs/ext2/super.c | 6 -- > > fs/ext4/super.c | 9 ++--- > > fs/xfs/xfs_super.c | 21 - > > include/linux/dax.h | 14 -- > > 11 files changed, 36 insertions(+), 110 deletions(-) > > [..] if (ext4_has_feature_inline_data(sb)) { > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > > index d07020a8eb9e3..163ceafbd8fd2 100644 > > --- a/fs/xfs/xfs_super.c > > +++ b/fs/xfs/xfs_super.c [..] > > + if (mp->m_super->s_blocksize != PAGE_SIZE) { > > + xfs_alert(mp, > > + "DAX not supported for blocksize. Turning off > > DAX.\n"); > > Newlines aren't needed at the end of extX_msg/xfs_alert format strings. Thanks Darrick, I fixed those up. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/3] virtio: cache indirect desc for packed
Hi Xuan, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on horms-ipvs/master] [also build test WARNING on linus/master v5.15-rc7] [cannot apply to mst-vhost/linux-next next-20211027] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025 base: https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs.git master config: hexagon-randconfig-r045-20211027 (attached as .config) compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 5db7568a6a1fcb408eb8988abdaff2a225a8eb72) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/bb65ceda850ed4592d8a940e01926d5e3d33ae92 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025 git checkout bb65ceda850ed4592d8a940e01926d5e3d33ae92 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 ARCH=hexagon If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All warnings (new ones prefixed by >>): >> drivers/virtio/virtio_ring.c:1467:7: warning: variable 'len' is >> uninitialized when used here [-Wuninitialized] n = len / sizeof(struct vring_packed_desc); ^~~ drivers/virtio/virtio_ring.c:1460:10: note: initialize the variable 'len' to silence this warning u32 len, n; ^ = 0 1 warning generated. vim +/len +1467 drivers/virtio/virtio_ring.c 1433 1434 static void detach_buf_packed(struct vring_virtqueue *vq, 1435unsigned int id, void **ctx) 1436 { 1437 struct vring_desc_state_packed *state = NULL; 1438 struct vring_packed_desc *desc; 1439 unsigned int i, curr; 1440 1441 state = &vq->packed.desc_state[id]; 1442 1443 /* Clear data ptr. */ 1444 state->data = NULL; 1445 1446 vq->packed.desc_extra[state->last].next = vq->free_head; 1447 vq->free_head = id; 1448 vq->vq.num_free += state->num; 1449 1450 if (unlikely(vq->use_dma_api)) { 1451 curr = id; 1452 for (i = 0; i < state->num; i++) { 1453 vring_unmap_state_packed(vq, 1454 &vq->packed.desc_extra[curr]); 1455 curr = vq->packed.desc_extra[curr].next; 1456 } 1457 } 1458 1459 if (vq->indirect) { 1460 u32 len, n; 1461 1462 /* Free the indirect table, if any, now that it's unmapped. */ 1463 desc = state->indir_desc; 1464 if (!desc) 1465 return; 1466 > 1467 n = len / sizeof(struct vring_packed_desc); 1468 1469 if (vq->use_dma_api) { 1470 len = vq->packed.desc_extra[id].len; 1471 for (i = 0; i < n; i++) 1472 vring_unmap_desc_packed(vq, &desc[i]); 1473 } 1474 1475 desc_cache_put_packed(vq, desc, n); 1476 state->indir_desc = NULL; 1477 } else if (ctx) { 1478 *ctx = state->indir_desc; 1479 } 1480 } 1481 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org .config.gz Description: application/gzip ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/3] virtio: cache indirect desc for split
Hi Xuan, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on horms-ipvs/master] [also build test WARNING on linus/master v5.15-rc7 next-20211027] [cannot apply to mst-vhost/linux-next] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025 base: https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs.git master config: i386-randconfig-s002-20211027 (attached as .config) compiler: gcc-9 (Debian 9.3.0-22) 9.3.0 reproduce: # apt-get install sparse # sparse version: v0.6.4-dirty # https://github.com/0day-ci/linux/commit/8935e116c155fb7d484bad35a42b2ca98f75e384 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025 git checkout 8935e116c155fb7d484bad35a42b2ca98f75e384 # save the attached .config to linux build tree make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=i386 SHELL=/bin/bash drivers/virtio/ If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot sparse warnings: (new ones prefixed by >>) >> drivers/virtio/virtio_ring.c:438:26: sparse: sparse: cast from restricted >> __virtio64 >> drivers/virtio/virtio_ring.c:447:28: sparse: sparse: incorrect type in >> assignment (different base types) @@ expected restricted __virtio64 >> [usertype] addr @@ got unsigned long long [usertype] @@ drivers/virtio/virtio_ring.c:447:28: sparse: expected restricted __virtio64 [usertype] addr drivers/virtio/virtio_ring.c:447:28: sparse: got unsigned long long [usertype] drivers/virtio/virtio_ring.c:464:49: sparse: sparse: cast from restricted __virtio64 vim +438 drivers/virtio/virtio_ring.c 431 432 static void desc_cache_chain_free_split(void *chain) 433 { 434 struct vring_desc *desc; 435 436 while (chain) { 437 desc = chain; > 438 chain = (void *)desc->addr; 439 kfree(desc); 440 } 441 } 442 443 static void desc_cache_put_split(struct vring_virtqueue *vq, 444 struct vring_desc *desc, int n) 445 { 446 if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) { > 447 desc->addr = (u64)vq->desc_cache_chain; 448 vq->desc_cache_chain = desc; 449 } else { 450 kfree(desc); 451 } 452 } 453 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org .config.gz Description: application/gzip ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 08/11] dm-linear: add a linear_dax_pgoff helper
On Sun, Oct 17, 2021 at 9:41 PM Christoph Hellwig wrote: > > Add a helper to perform the entire remapping for DAX accesses. This > helper open codes bdev_dax_pgoff given that the alignment checks have > already been done by the submitting file system and don't need to be > repeated. Looks good. Mike, ack? > > Signed-off-by: Christoph Hellwig > --- > drivers/md/dm-linear.c | 49 +- > 1 file changed, 15 insertions(+), 34 deletions(-) > > diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c > index 32fbab11bf90c..bf03f73fd0f36 100644 > --- a/drivers/md/dm-linear.c > +++ b/drivers/md/dm-linear.c > @@ -164,63 +164,44 @@ static int linear_iterate_devices(struct dm_target *ti, > } > > #if IS_ENABLED(CONFIG_FS_DAX) > +static struct dax_device *linear_dax_pgoff(struct dm_target *ti, pgoff_t > *pgoff) > +{ > + struct linear_c *lc = ti->private; > + sector_t sector = linear_map_sector(ti, *pgoff << PAGE_SECTORS_SHIFT); > + > + *pgoff = (get_start_sect(lc->dev->bdev) + sector) >> > PAGE_SECTORS_SHIFT; > + return lc->dev->dax_dev; > +} > + > static long linear_dax_direct_access(struct dm_target *ti, pgoff_t pgoff, > long nr_pages, void **kaddr, pfn_t *pfn) > { > - long ret; > - struct linear_c *lc = ti->private; > - struct block_device *bdev = lc->dev->bdev; > - struct dax_device *dax_dev = lc->dev->dax_dev; > - sector_t dev_sector, sector = pgoff * PAGE_SECTORS; > - > - dev_sector = linear_map_sector(ti, sector); > - ret = bdev_dax_pgoff(bdev, dev_sector, nr_pages * PAGE_SIZE, &pgoff); > - if (ret) > - return ret; > + struct dax_device *dax_dev = linear_dax_pgoff(ti, &pgoff); > + > return dax_direct_access(dax_dev, pgoff, nr_pages, kaddr, pfn); > } > > static size_t linear_dax_copy_from_iter(struct dm_target *ti, pgoff_t pgoff, > void *addr, size_t bytes, struct iov_iter *i) > { > - struct linear_c *lc = ti->private; > - struct block_device *bdev = lc->dev->bdev; > - struct dax_device *dax_dev = lc->dev->dax_dev; > - sector_t dev_sector, sector = pgoff * PAGE_SECTORS; > + struct dax_device *dax_dev = linear_dax_pgoff(ti, &pgoff); > > - dev_sector = linear_map_sector(ti, sector); > - if (bdev_dax_pgoff(bdev, dev_sector, ALIGN(bytes, PAGE_SIZE), &pgoff)) > - return 0; > return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i); > } > > static size_t linear_dax_copy_to_iter(struct dm_target *ti, pgoff_t pgoff, > void *addr, size_t bytes, struct iov_iter *i) > { > - struct linear_c *lc = ti->private; > - struct block_device *bdev = lc->dev->bdev; > - struct dax_device *dax_dev = lc->dev->dax_dev; > - sector_t dev_sector, sector = pgoff * PAGE_SECTORS; > + struct dax_device *dax_dev = linear_dax_pgoff(ti, &pgoff); > > - dev_sector = linear_map_sector(ti, sector); > - if (bdev_dax_pgoff(bdev, dev_sector, ALIGN(bytes, PAGE_SIZE), &pgoff)) > - return 0; > return dax_copy_to_iter(dax_dev, pgoff, addr, bytes, i); > } > > static int linear_dax_zero_page_range(struct dm_target *ti, pgoff_t pgoff, > size_t nr_pages) > { > - int ret; > - struct linear_c *lc = ti->private; > - struct block_device *bdev = lc->dev->bdev; > - struct dax_device *dax_dev = lc->dev->dax_dev; > - sector_t dev_sector, sector = pgoff * PAGE_SECTORS; > - > - dev_sector = linear_map_sector(ti, sector); > - ret = bdev_dax_pgoff(bdev, dev_sector, nr_pages << PAGE_SHIFT, > &pgoff); > - if (ret) > - return ret; > + struct dax_device *dax_dev = linear_dax_pgoff(ti, &pgoff); > + > return dax_zero_page_range(dax_dev, pgoff, nr_pages); > } > > -- > 2.30.2 > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 09/11] dm-log-writes: add a log_writes_dax_pgoff helper
On Sun, Oct 17, 2021 at 9:41 PM Christoph Hellwig wrote: > > Add a helper to perform the entire remapping for DAX accesses. This > helper open codes bdev_dax_pgoff given that the alignment checks have > already been done by the submitting file system and don't need to be > repeated. Looks good. Mike, ack? ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 10/11] dm-stripe: add a stripe_dax_pgoff helper
On Sun, Oct 17, 2021 at 9:41 PM Christoph Hellwig wrote: > > Add a helper to perform the entire remapping for DAX accesses. This > helper open codes bdev_dax_pgoff given that the alignment checks have > already been done by the submitting file system and don't need to be > repeated. Again, looks good. Kind of embarrassing when the open-coded version is less LOC than using the helper. Mike, ack? > > Signed-off-by: Christoph Hellwig > --- > drivers/md/dm-stripe.c | 63 ++ > 1 file changed, 15 insertions(+), 48 deletions(-) > > diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c > index f084607220293..50dba3f39274c 100644 > --- a/drivers/md/dm-stripe.c > +++ b/drivers/md/dm-stripe.c > @@ -301,83 +301,50 @@ static int stripe_map(struct dm_target *ti, struct bio > *bio) > } > > #if IS_ENABLED(CONFIG_FS_DAX) > -static long stripe_dax_direct_access(struct dm_target *ti, pgoff_t pgoff, > - long nr_pages, void **kaddr, pfn_t *pfn) > +static struct dax_device *stripe_dax_pgoff(struct dm_target *ti, pgoff_t > *pgoff) > { > - sector_t dev_sector, sector = pgoff * PAGE_SECTORS; > struct stripe_c *sc = ti->private; > - struct dax_device *dax_dev; > struct block_device *bdev; > + sector_t dev_sector; > uint32_t stripe; > - long ret; > > - stripe_map_sector(sc, sector, &stripe, &dev_sector); > + stripe_map_sector(sc, *pgoff * PAGE_SECTORS, &stripe, &dev_sector); > dev_sector += sc->stripe[stripe].physical_start; > - dax_dev = sc->stripe[stripe].dev->dax_dev; > bdev = sc->stripe[stripe].dev->bdev; > > - ret = bdev_dax_pgoff(bdev, dev_sector, nr_pages * PAGE_SIZE, &pgoff); > - if (ret) > - return ret; > + *pgoff = (get_start_sect(bdev) + dev_sector) >> PAGE_SECTORS_SHIFT; > + return sc->stripe[stripe].dev->dax_dev; > +} > + > +static long stripe_dax_direct_access(struct dm_target *ti, pgoff_t pgoff, > + long nr_pages, void **kaddr, pfn_t *pfn) > +{ > + struct dax_device *dax_dev = stripe_dax_pgoff(ti, &pgoff); > + > return dax_direct_access(dax_dev, pgoff, nr_pages, kaddr, pfn); > } > > static size_t stripe_dax_copy_from_iter(struct dm_target *ti, pgoff_t pgoff, > void *addr, size_t bytes, struct iov_iter *i) > { > - sector_t dev_sector, sector = pgoff * PAGE_SECTORS; > - struct stripe_c *sc = ti->private; > - struct dax_device *dax_dev; > - struct block_device *bdev; > - uint32_t stripe; > - > - stripe_map_sector(sc, sector, &stripe, &dev_sector); > - dev_sector += sc->stripe[stripe].physical_start; > - dax_dev = sc->stripe[stripe].dev->dax_dev; > - bdev = sc->stripe[stripe].dev->bdev; > + struct dax_device *dax_dev = stripe_dax_pgoff(ti, &pgoff); > > - if (bdev_dax_pgoff(bdev, dev_sector, ALIGN(bytes, PAGE_SIZE), &pgoff)) > - return 0; > return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i); > } > > static size_t stripe_dax_copy_to_iter(struct dm_target *ti, pgoff_t pgoff, > void *addr, size_t bytes, struct iov_iter *i) > { > - sector_t dev_sector, sector = pgoff * PAGE_SECTORS; > - struct stripe_c *sc = ti->private; > - struct dax_device *dax_dev; > - struct block_device *bdev; > - uint32_t stripe; > - > - stripe_map_sector(sc, sector, &stripe, &dev_sector); > - dev_sector += sc->stripe[stripe].physical_start; > - dax_dev = sc->stripe[stripe].dev->dax_dev; > - bdev = sc->stripe[stripe].dev->bdev; > + struct dax_device *dax_dev = stripe_dax_pgoff(ti, &pgoff); > > - if (bdev_dax_pgoff(bdev, dev_sector, ALIGN(bytes, PAGE_SIZE), &pgoff)) > - return 0; > return dax_copy_to_iter(dax_dev, pgoff, addr, bytes, i); > } > > static int stripe_dax_zero_page_range(struct dm_target *ti, pgoff_t pgoff, > size_t nr_pages) > { > - int ret; > - sector_t dev_sector, sector = pgoff * PAGE_SECTORS; > - struct stripe_c *sc = ti->private; > - struct dax_device *dax_dev; > - struct block_device *bdev; > - uint32_t stripe; > + struct dax_device *dax_dev = stripe_dax_pgoff(ti, &pgoff); > > - stripe_map_sector(sc, sector, &stripe, &dev_sector); > - dev_sector += sc->stripe[stripe].physical_start; > - dax_dev = sc->stripe[stripe].dev->dax_dev; > - bdev = sc->stripe[stripe].dev->bdev; > - > - ret = bdev_dax_pgoff(bdev, dev_sector, nr_pages << PAGE_SHIFT, > &pgoff); > - if (ret) > - return ret; > return dax_zero_page_range(dax_dev, pgoff, nr_pages); > } > > -- > 2.30.2 > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 11/11] dax: move bdev_dax_pgoff to fs/dax.c
On Sun, Oct 17, 2021 at 9:41 PM Christoph Hellwig wrote: > > No functional changet, but this will allow for a tighter integration s/changet/changes/ > with the iomap code, including possible passing the partition offset s/possible/possibly/ > in the iomap in the future. For now it mostly avoids growing more s/now/now,/ ...all of the above fixed up locally. Other than that, it looks good to me. > callers outside of fs/dax.c. > > Signed-off-by: Christoph Hellwig > --- > drivers/dax/super.c | 14 -- > fs/dax.c| 13 + > include/linux/dax.h | 1 - > 3 files changed, 13 insertions(+), 15 deletions(-) > > diff --git a/drivers/dax/super.c b/drivers/dax/super.c > index 803942586d1b6..c0910687fbcb2 100644 > --- a/drivers/dax/super.c > +++ b/drivers/dax/super.c > @@ -67,20 +67,6 @@ void dax_remove_host(struct gendisk *disk) > } > EXPORT_SYMBOL_GPL(dax_remove_host); > > -int bdev_dax_pgoff(struct block_device *bdev, sector_t sector, size_t size, > - pgoff_t *pgoff) > -{ > - sector_t start_sect = bdev ? get_start_sect(bdev) : 0; > - phys_addr_t phys_off = (start_sect + sector) * 512; > - > - if (pgoff) > - *pgoff = PHYS_PFN(phys_off); > - if (phys_off % PAGE_SIZE || size % PAGE_SIZE) > - return -EINVAL; > - return 0; > -} > -EXPORT_SYMBOL(bdev_dax_pgoff); > - > /** > * dax_get_by_host() - temporary lookup mechanism for filesystem-dax > * @bdev: block device to find a dax_device for > diff --git a/fs/dax.c b/fs/dax.c > index 4e3e5a283a916..eb715363fd667 100644 > --- a/fs/dax.c > +++ b/fs/dax.c > @@ -709,6 +709,19 @@ int dax_invalidate_mapping_entry_sync(struct > address_space *mapping, > return __dax_invalidate_entry(mapping, index, false); > } > > +static int bdev_dax_pgoff(struct block_device *bdev, sector_t sector, size_t > size, > + pgoff_t *pgoff) > +{ > + sector_t start_sect = bdev ? get_start_sect(bdev) : 0; > + phys_addr_t phys_off = (start_sect + sector) * 512; > + > + if (pgoff) > + *pgoff = PHYS_PFN(phys_off); > + if (phys_off % PAGE_SIZE || size % PAGE_SIZE) > + return -EINVAL; > + return 0; > +} > + > static int copy_cow_page_dax(struct block_device *bdev, struct dax_device > *dax_dev, > sector_t sector, struct page *to, unsigned long > vaddr) > { > diff --git a/include/linux/dax.h b/include/linux/dax.h > index 439c3c70e347b..324363b798ecd 100644 > --- a/include/linux/dax.h > +++ b/include/linux/dax.h > @@ -107,7 +107,6 @@ static inline bool daxdev_mapping_supported(struct > vm_area_struct *vma, > #endif > > struct writeback_control; > -int bdev_dax_pgoff(struct block_device *, sector_t, size_t, pgoff_t *pgoff); > #if IS_ENABLED(CONFIG_FS_DAX) > int dax_add_host(struct dax_device *dax_dev, struct gendisk *disk); > void dax_remove_host(struct gendisk *disk); > -- > 2.30.2 > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 3/3] virtio-net: enable virtio indirect cache
On Wed, 27 Oct 2021 08:55:28 -0700, Jakub Kicinski wrote: > On Wed, 27 Oct 2021 14:19:13 +0800 Xuan Zhuo wrote: > > +static bool virtio_desc_cache = true; > > module_param(csum, bool, 0444); > > module_param(gso, bool, 0444); > > module_param(napi_tx, bool, 0644); > > +module_param(virtio_desc_cache, bool, 0644); > > Can this be an ethtool priv flag? module params are discouraged because > they can't be controlled per-netdev. The current design can only be set when the device is initialized. So using ethtool to modify it will not work. Thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/3] virtio: cache indirect desc for split
On Thu, Oct 28, 2021 at 1:07 AM Michael S. Tsirkin wrote: > > On Wed, Oct 27, 2021 at 02:19:11PM +0800, Xuan Zhuo wrote: > > In the case of using indirect, indirect desc must be allocated and > > released each time, which increases a lot of cpu overhead. > > > > Here, a cache is added for indirect. If the number of indirect desc to be > > applied for is less than VIRT_QUEUE_CACHE_DESC_NUM, the desc array with > > the size of VIRT_QUEUE_CACHE_DESC_NUM is fixed and cached for reuse. > > > > Signed-off-by: Xuan Zhuo > > --- > > drivers/virtio/virtio.c | 6 > > drivers/virtio/virtio_ring.c | 63 ++-- > > include/linux/virtio.h | 10 ++ > > 3 files changed, 70 insertions(+), 9 deletions(-) > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c > > index 0a5b54034d4b..04bcb74e5b9a 100644 > > --- a/drivers/virtio/virtio.c > > +++ b/drivers/virtio/virtio.c > > @@ -431,6 +431,12 @@ bool is_virtio_device(struct device *dev) > > } > > EXPORT_SYMBOL_GPL(is_virtio_device); > > > > +void virtio_use_desc_cache(struct virtio_device *dev, bool val) > > +{ > > + dev->desc_cache = val; > > +} > > +EXPORT_SYMBOL_GPL(virtio_use_desc_cache); > > + > > void unregister_virtio_device(struct virtio_device *dev) > > { > > int index = dev->index; /* save for after device release */ > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c > > index dd95dfd85e98..0b9a8544b0e8 100644 > > --- a/drivers/virtio/virtio_ring.c > > +++ b/drivers/virtio/virtio_ring.c > > @@ -117,6 +117,10 @@ struct vring_virtqueue { > > /* Hint for event idx: already triggered no need to disable. */ > > bool event_triggered; > > > > + /* Is indirect cache used? */ > > + bool use_desc_cache; > > + void *desc_cache_chain; > > + > > union { > > /* Available for split ring */ > > struct { > > @@ -423,12 +427,47 @@ static unsigned int vring_unmap_one_split(const > > struct vring_virtqueue *vq, > > return extra[i].next; > > } > > > > -static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq, > > +#define VIRT_QUEUE_CACHE_DESC_NUM 4 > > + > > +static void desc_cache_chain_free_split(void *chain) > > +{ > > + struct vring_desc *desc; > > + > > + while (chain) { > > + desc = chain; > > + chain = (void *)desc->addr; > > + kfree(desc); > > + } > > +} > > + > > +static void desc_cache_put_split(struct vring_virtqueue *vq, > > + struct vring_desc *desc, int n) > > +{ > > + if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) { > > + desc->addr = (u64)vq->desc_cache_chain; > > + vq->desc_cache_chain = desc; > > + } else { > > + kfree(desc); > > + } > > +} > > + > > > So I have a question here. What happens if we just do: > > if (n <= VIRT_QUEUE_CACHE_DESC_NUM) { > return kmem_cache_alloc(VIRT_QUEUE_CACHE_DESC_NUM * sizeof desc, gfp) > } else { > return kmalloc_arrat(n, sizeof desc, gfp) > } > > A small change and won't we reap most performance benefits? Yes, I think we need a benchmark to use private cache to see how much it can help. Thanks > > -- > MST > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: vDPA bus driver selection
On Thu, Oct 28, 2021 at 4:16 AM Michael S. Tsirkin wrote: > > On Wed, Oct 27, 2021 at 03:21:15PM +, Parav Pandit wrote: > > Hi Stefano, > > > > > From: Stefano Garzarella > > > Sent: Wednesday, October 27, 2021 8:04 PM > > > > > > Hi folks, > > > I was trying to understand if we have a way to specify which vDPA bus > > > driver > > > (e.g. vhost-vdpa, virtio-vdpa) a device should use. > > > IIUC we don't have it, and the first registered driver is used when a new > > > device > > > is registered. > > > > > > I was thinking if it makes sense to extend the management API to specify > > > which > > > bus driver to use for a device. Actually, we want to support this in the first version of vDPA bus. But for some reason it was dropped. The idea is to specify the device type 'virtio' or 'vhost'. But a concern is that, it may encourage vendor to implement e.g virtio specific device (without DMA isolation). >A use case could be for example a single host > > > handling VMs and bare-metal containers, so we would have both virtio-vdpa > > > and vhost-vdpa loaded and we want to attach some devices to VMs through > > > vhost-vdpa and others to containers through virtio-vdpa. > > > > > > What do you think? > > > > > One option is, user keeps the drivers_autoprobe disabled for the vdpa bus > > using, > > > > $ vdpa/vdpa dev add mgmtdev vdpasim_net name vdpa0 mac 00:11:22:33:44:55 > > $ echo 0 > /sys/bus/vdpa/drivers_autoprobe > > > > And after vdpa device creation, it manually binds to the desired driver > > such as, > > > > $ echo vdpa0 > /sys/bus/vdpa/drivers/virtio_vdpa/bind > > Or > > $ echo vdpa0 > /sys/bus/vdpa/drivers/vhost_vdpa/bind > > > > In an case of VDUSE, it makes more sense to bind to the one of the above > > driver after user space has connected the use space backend to the kernel > > device. > > The only annoying thing is that manual bind is not validated. > E.g. if one makes a mistake and binds an incorrect device, > it just tends to crash IIRC. > Another is that it all needs to be root. I'm not sure it's worth bothering with. As discussed, switching between vendor drivers and vfio requires manually bind/unbind as well. Thanks > > -- > MST > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 3/3] virtio-net: enable virtio indirect cache
On Thu, Oct 28, 2021 at 9:59 AM Xuan Zhuo wrote: > > On Wed, 27 Oct 2021 08:55:28 -0700, Jakub Kicinski wrote: > > On Wed, 27 Oct 2021 14:19:13 +0800 Xuan Zhuo wrote: > > > +static bool virtio_desc_cache = true; > > > module_param(csum, bool, 0444); > > > module_param(gso, bool, 0444); > > > module_param(napi_tx, bool, 0644); > > > +module_param(virtio_desc_cache, bool, 0644); > > > > Can this be an ethtool priv flag? module params are discouraged because > > they can't be controlled per-netdev. > > > The current design can only be set when the device is initialized. So using > ethtool to modify it will not work. Anyhow you can add things like synchronization to make it work. But I think what we want is to make it work unconditionally, so having a module parameter seems useless. If you want to use it for benchmarking? Thanks > > Thanks. > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next v2] net: virtio: use eth_hw_addr_set()
On Wed, Oct 27, 2021 at 11:31 PM Michael S. Tsirkin wrote: > > On Wed, Oct 27, 2021 at 08:20:12AM -0700, Jakub Kicinski wrote: > > Commit 406f42fa0d3c ("net-next: When a bond have a massive amount > > of VLANs...") introduced a rbtree for faster Ethernet address look > > up. To maintain netdev->dev_addr in this tree we need to make all > > the writes to it go through appropriate helpers. > > > > Even though the current code uses dev->addr_len the we can switch > > to eth_hw_addr_set() instead of dev_addr_set(). The netdev is > > always allocated by alloc_etherdev_mq() and there are at least two > > places which assume Ethernet address: > > - the line below calling eth_hw_addr_random() > > - virtnet_set_mac_address() -> eth_commit_mac_addr_change() > > > > Signed-off-by: Jakub Kicinski > > Acked-by: Michael S. Tsirkin Acked-by: Jason Wang > > > --- > > v2: - actually switch to eth_hw_addr_set() not dev_addr_set() > > - resize the buffer to ETH_ALEN > > - pass ETH_ALEN instead of dev->dev_addr to virtio_cread_bytes() > > > > CC: m...@redhat.com > > CC: jasow...@redhat.com > > CC: virtualization@lists.linux-foundation.org > > --- > > drivers/net/virtio_net.c | 10 +++--- > > 1 file changed, 7 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > index c501b5974aee..cc79343cd220 100644 > > --- a/drivers/net/virtio_net.c > > +++ b/drivers/net/virtio_net.c > > @@ -3177,12 +3177,16 @@ static int virtnet_probe(struct virtio_device *vdev) > > dev->max_mtu = MAX_MTU; > > > > /* Configuration may specify what MAC to use. Otherwise random. */ > > - if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC)) > > + if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC)) { > > + u8 addr[ETH_ALEN]; > > + > > virtio_cread_bytes(vdev, > > offsetof(struct virtio_net_config, mac), > > -dev->dev_addr, dev->addr_len); > > - else > > +addr, ETH_ALEN); > > + eth_hw_addr_set(dev, addr); > > + } else { > > eth_hw_addr_random(dev); > > + } > > > > /* Set up our device-specific information */ > > vi = netdev_priv(dev); > > -- > > 2.31.1 > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
RE: vDPA bus driver selection
> From: Michael S. Tsirkin > Sent: Thursday, October 28, 2021 1:46 AM > > On Wed, Oct 27, 2021 at 03:21:15PM +, Parav Pandit wrote: > > Hi Stefano, > > > > > From: Stefano Garzarella > > > Sent: Wednesday, October 27, 2021 8:04 PM > > > > > > Hi folks, > > > I was trying to understand if we have a way to specify which vDPA > > > bus driver (e.g. vhost-vdpa, virtio-vdpa) a device should use. > > > IIUC we don't have it, and the first registered driver is used when > > > a new device is registered. > > > > > > I was thinking if it makes sense to extend the management API to > > > specify which bus driver to use for a device. A use case could be > > > for example a single host handling VMs and bare-metal containers, so > > > we would have both virtio-vdpa and vhost-vdpa loaded and we want to > > > attach some devices to VMs through vhost-vdpa and others to containers > through virtio-vdpa. > > > > > > What do you think? > > > > > One option is, user keeps the drivers_autoprobe disabled for the vdpa > > bus using, > > > > $ vdpa/vdpa dev add mgmtdev vdpasim_net name vdpa0 mac > > 00:11:22:33:44:55 $ echo 0 > /sys/bus/vdpa/drivers_autoprobe > > > > And after vdpa device creation, it manually binds to the desired > > driver such as, > > > > $ echo vdpa0 > /sys/bus/vdpa/drivers/virtio_vdpa/bind > > Or > > $ echo vdpa0 > /sys/bus/vdpa/drivers/vhost_vdpa/bind > > > > In an case of VDUSE, it makes more sense to bind to the one of the above > driver after user space has connected the use space backend to the kernel > device. > > The only annoying thing is that manual bind is not validated. > E.g. if one makes a mistake and binds an incorrect device, it just tends to > crash > IIRC. Only a vdpa device can be bind/unbind to a vdpa bus driver. Such checks are done by the kernel core. I didn't follow when can it crash. Can you please share an example when can it crash? ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/3] virtio: cache indirect desc for packed
Hi Xuan, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on horms-ipvs/master] [also build test WARNING on linus/master v5.15-rc7] [cannot apply to mst-vhost/linux-next next-20211027] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025 base: https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs.git master config: i386-randconfig-s002-20211027 (attached as .config) compiler: gcc-9 (Debian 9.3.0-22) 9.3.0 reproduce: # apt-get install sparse # sparse version: v0.6.4-dirty # https://github.com/0day-ci/linux/commit/bb65ceda850ed4592d8a940e01926d5e3d33ae92 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025 git checkout bb65ceda850ed4592d8a940e01926d5e3d33ae92 # save the attached .config to linux build tree make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=i386 SHELL=/bin/bash drivers/virtio/ If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot sparse warnings: (new ones prefixed by >>) drivers/virtio/virtio_ring.c:438:26: sparse: sparse: cast from restricted __virtio64 drivers/virtio/virtio_ring.c:447:28: sparse: sparse: incorrect type in assignment (different base types) @@ expected restricted __virtio64 [usertype] addr @@ got unsigned long long [usertype] @@ drivers/virtio/virtio_ring.c:447:28: sparse: expected restricted __virtio64 [usertype] addr drivers/virtio/virtio_ring.c:447:28: sparse: got unsigned long long [usertype] drivers/virtio/virtio_ring.c:464:49: sparse: sparse: cast from restricted __virtio64 >> drivers/virtio/virtio_ring.c:1083:26: sparse: sparse: cast from restricted >> __le64 >> drivers/virtio/virtio_ring.c:1092:28: sparse: sparse: incorrect type in >> assignment (different base types) @@ expected restricted __le64 >> [usertype] addr @@ got unsigned long long [usertype] @@ drivers/virtio/virtio_ring.c:1092:28: sparse: expected restricted __le64 [usertype] addr drivers/virtio/virtio_ring.c:1092:28: sparse: got unsigned long long [usertype] drivers/virtio/virtio_ring.c:1109:49: sparse: sparse: cast from restricted __le64 vim +1083 drivers/virtio/virtio_ring.c 1076 1077 static void desc_cache_chain_free_packed(void *chain) 1078 { 1079 struct vring_packed_desc *desc; 1080 1081 while (chain) { 1082 desc = chain; > 1083 chain = (void *)desc->addr; 1084 kfree(desc); 1085 } 1086 } 1087 1088 static void desc_cache_put_packed(struct vring_virtqueue *vq, 1089struct vring_packed_desc *desc, int n) 1090 { 1091 if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) { > 1092 desc->addr = (u64)vq->desc_cache_chain; 1093 vq->desc_cache_chain = desc; 1094 } else { 1095 kfree(desc); 1096 } 1097 } 1098 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org .config.gz Description: application/gzip ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC] Add DMA_API support for Virtio devices earlier than VirtIO 1.0
On Wed, Oct 27, 2021 at 04:28:28PM -0700, Erdem Aktas wrote: > Enable DMA_API for any VirtIO device earlier than Virtio 1.0 which > is the only way for those devices to be configured correctly when > memory access is retricted. > > Virtio devices can use DMA_API to translate guest phsical addresses to > device physical addresses if VIRTIO_F_ACCESS_PLATFORM feature is set > while the device is being initialized. VIRTIO_F_ACCESS_PLATFORM > feature is only supported in VirtIO 1.0 and later devices. This prevents > any device using an earlier VirtIO version than Virtio 1.0 to be > attached when memory access is restricted ie memory encryption features > (AMD SEV [ES/SNP], Intel TDX, etc..) are enabled. > > Signed-off-by: Erdem Aktas Sorry .. NACK. Virtio before 1.0 is on life support. No new features. Just use 1.0 please. > --- > I have tested the this patch using linux-stable.git head, 5.15.0-rc6 > kernel and scsi disk with virtio 0.95 version with legacy VM and > Confidential VM (AMD SEV). I want to get feedback if > there is any risk or downside of enabling DMA_API on older virtio > drivers when memory encrytion is enabled. > > drivers/virtio/virtio.c | 7 ++- > include/linux/virtio_config.h | 22 ++ > 2 files changed, 16 insertions(+), 13 deletions(-) > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c > index 236081afe9a2..71115ba85d07 100644 > --- a/drivers/virtio/virtio.c > +++ b/drivers/virtio/virtio.c > @@ -179,11 +179,8 @@ int virtio_finalize_features(struct virtio_device *dev) > if (ret) { > if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1)) { > dev_warn(&dev->dev, > - "device must provide VIRTIO_F_VERSION_1\n"); > - return -ENODEV; > - } > - > - if (!virtio_has_feature(dev, VIRTIO_F_ACCESS_PLATFORM)) { > + "device does not provide VIRTIO_F_VERSION_1 > while restricted memory access is enabled!.\n"); > + } else if (!virtio_has_feature(dev, VIRTIO_F_ACCESS_PLATFORM)) { > dev_warn(&dev->dev, >"device must provide > VIRTIO_F_ACCESS_PLATFORM\n"); > return -ENODEV; > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h > index 8519b3ae5d52..6eacb4d43318 100644 > --- a/include/linux/virtio_config.h > +++ b/include/linux/virtio_config.h > @@ -170,6 +170,15 @@ static inline bool virtio_has_feature(const struct > virtio_device *vdev, > return __virtio_test_bit(vdev, fbit); > } > > +#ifdef CONFIG_ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS > +int arch_has_restricted_virtio_memory_access(void); > +#else > +static inline int arch_has_restricted_virtio_memory_access(void) > +{ > + return 0; > +} > +#endif /* CONFIG_ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS */ > + > /** > * virtio_has_dma_quirk - determine whether this device has the DMA quirk > * @vdev: the device > @@ -180,6 +189,11 @@ static inline bool virtio_has_dma_quirk(const struct > virtio_device *vdev) >* Note the reverse polarity of the quirk feature (compared to most >* other features), this is for compatibility with legacy systems. >*/ > + if (!virtio_has_feature(vdev, VIRTIO_F_VERSION_1) && > +arch_has_restricted_virtio_memory_access()) > + return false; > + > + > return !virtio_has_feature(vdev, VIRTIO_F_ACCESS_PLATFORM); > } > > @@ -558,13 +572,5 @@ static inline void virtio_cwrite64(struct virtio_device > *vdev, > _r; \ > }) > > -#ifdef CONFIG_ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS > -int arch_has_restricted_virtio_memory_access(void); > -#else > -static inline int arch_has_restricted_virtio_memory_access(void) > -{ > - return 0; > -} > -#endif /* CONFIG_ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS */ > > #endif /* _LINUX_VIRTIO_CONFIG_H */ > -- > 2.30.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/3] virtio: cache indirect desc for split
On Thu, 28 Oct 2021 10:16:10 +0800, Jason Wang wrote: > On Thu, Oct 28, 2021 at 1:07 AM Michael S. Tsirkin wrote: > > > > On Wed, Oct 27, 2021 at 02:19:11PM +0800, Xuan Zhuo wrote: > > > In the case of using indirect, indirect desc must be allocated and > > > released each time, which increases a lot of cpu overhead. > > > > > > Here, a cache is added for indirect. If the number of indirect desc to be > > > applied for is less than VIRT_QUEUE_CACHE_DESC_NUM, the desc array with > > > the size of VIRT_QUEUE_CACHE_DESC_NUM is fixed and cached for reuse. > > > > > > Signed-off-by: Xuan Zhuo > > > --- > > > drivers/virtio/virtio.c | 6 > > > drivers/virtio/virtio_ring.c | 63 ++-- > > > include/linux/virtio.h | 10 ++ > > > 3 files changed, 70 insertions(+), 9 deletions(-) > > > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c > > > index 0a5b54034d4b..04bcb74e5b9a 100644 > > > --- a/drivers/virtio/virtio.c > > > +++ b/drivers/virtio/virtio.c > > > @@ -431,6 +431,12 @@ bool is_virtio_device(struct device *dev) > > > } > > > EXPORT_SYMBOL_GPL(is_virtio_device); > > > > > > +void virtio_use_desc_cache(struct virtio_device *dev, bool val) > > > +{ > > > + dev->desc_cache = val; > > > +} > > > +EXPORT_SYMBOL_GPL(virtio_use_desc_cache); > > > + > > > void unregister_virtio_device(struct virtio_device *dev) > > > { > > > int index = dev->index; /* save for after device release */ > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c > > > index dd95dfd85e98..0b9a8544b0e8 100644 > > > --- a/drivers/virtio/virtio_ring.c > > > +++ b/drivers/virtio/virtio_ring.c > > > @@ -117,6 +117,10 @@ struct vring_virtqueue { > > > /* Hint for event idx: already triggered no need to disable. */ > > > bool event_triggered; > > > > > > + /* Is indirect cache used? */ > > > + bool use_desc_cache; > > > + void *desc_cache_chain; > > > + > > > union { > > > /* Available for split ring */ > > > struct { > > > @@ -423,12 +427,47 @@ static unsigned int vring_unmap_one_split(const > > > struct vring_virtqueue *vq, > > > return extra[i].next; > > > } > > > > > > -static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq, > > > +#define VIRT_QUEUE_CACHE_DESC_NUM 4 > > > + > > > +static void desc_cache_chain_free_split(void *chain) > > > +{ > > > + struct vring_desc *desc; > > > + > > > + while (chain) { > > > + desc = chain; > > > + chain = (void *)desc->addr; > > > + kfree(desc); > > > + } > > > +} > > > + > > > +static void desc_cache_put_split(struct vring_virtqueue *vq, > > > + struct vring_desc *desc, int n) > > > +{ > > > + if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) { > > > + desc->addr = (u64)vq->desc_cache_chain; > > > + vq->desc_cache_chain = desc; > > > + } else { > > > + kfree(desc); > > > + } > > > +} > > > + > > > > > > So I have a question here. What happens if we just do: > > > > if (n <= VIRT_QUEUE_CACHE_DESC_NUM) { > > return kmem_cache_alloc(VIRT_QUEUE_CACHE_DESC_NUM * sizeof desc, > > gfp) > > } else { > > return kmalloc_arrat(n, sizeof desc, gfp) > > } > > > > A small change and won't we reap most performance benefits? > > Yes, I think we need a benchmark to use private cache to see how much > it can help. I did a test, the code is as follows: +static void desc_cache_put_packed(struct vring_virtqueue *vq, + struct vring_packed_desc *desc, int n) + { + if (n <= VIRT_QUEUE_CACHE_DESC_NUM) { + kmem_cache_free(vq->desc_cache, desc); + } else { + kfree(desc); + } @@ -476,11 +452,14 @@ static struct vring_desc *alloc_indirect_split(struct vring_virtqueue *vq, */ gfp &= ~__GFP_HIGHMEM; - desc = kmalloc_array(n, sizeof(struct vring_desc), gfp); + if (total_sg <= VIRT_QUEUE_CACHE_DESC_NUM) + desc = kmem_cache_alloc(vq->desc_cache, gfp); + else + desc = kmalloc_array(total_sg, sizeof(struct vring_desc), gfp); + ... + vq->desc_cache = kmem_cache_create("virio_desc", + 4 * sizeof(struct vring_desc), + 0, 0, NULL); The effect is not good, basically there is no improvement, using perf top can see that the overhead of kmem_cache_alloc/kmem_cache_free is also relatively large: 26.91% [kernel] [k] virtqueue_add 15.35% [kernel] [k] detach_buf_split 14.15% [kernel] [k] virtnet_xsk_xmit 13.24% [kernel] [k] virtqueue_add_outbuf > 9.30% [kernel] [k] __slab_free > 3.91% [kernel] [k] kmem_cache_alloc 2.85% [kernel] [k] virtqueue_get_buf_ctx > 2.82% [kernel] [k] kmem_cache_