Re: [PATCH 09/16] vdpa/mlx5: Allow creation/deletion of any given mr struct
On Tue, 2023-09-26 at 12:44 +0800, Jason Wang wrote: > On Tue, Sep 12, 2023 at 9:02 PM Dragos Tatulea wrote: > > > > This patch adapts the mr creation/deletion code to be able to work with > > any given mr struct pointer. All the APIs are adapted to take an extra > > parameter for the mr. > > > > mlx5_vdpa_create/delete_mr doesn't need a ASID parameter anymore. The > > check is done in the caller instead (mlx5_set_map). > > > > This change is needed for a followup patch which will introduce an > > additional mr for the vq descriptor data. > > > > Signed-off-by: Dragos Tatulea > > --- > > Thinking of this decoupling I think I have a question. > > We advertise 2 address spaces and 2 groups. So we actually don't know > for example which address spaces will be used by dvq. > > And actually we allow the user space to do something like > > set_group_asid(dvq_group, 0) > set_map(0) > set_group_asid(dvq_group, 1) > set_map(1) > > I wonder if the decoupling like this patch can work and why. > This scenario could indeed work. Especially if you look at the 13'th patch [0] where hw support is added. Are you wondering if this should work at all or if it should be blocked? > It looks to me the most easy way is to let each AS be backed by an MR. > Then we don't even need to care about the dvq, cvq. That's what this patch series dowes. Thanks, Dragos [0]https://lore.kernel.org/virtualization/20230912130132.561193-14-dtatu...@nvidia.com/T/#u ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH vfio 08/11] vfio/pci: Expose vfio_pci_core_setup_barmap()
On 21/09/2023 19:35, Alex Williamson wrote: On Thu, 21 Sep 2023 15:40:37 +0300 Yishai Hadas wrote: Expose vfio_pci_core_setup_barmap() to be used by drivers. This will let drivers to mmap a BAR and re-use it from both vfio and the driver when it's applicable. This API will be used in the next patches by the vfio/virtio coming driver. Signed-off-by: Yishai Hadas --- drivers/vfio/pci/vfio_pci_core.c | 25 + drivers/vfio/pci/vfio_pci_rdwr.c | 28 ++-- include/linux/vfio_pci_core.h| 1 + 3 files changed, 28 insertions(+), 26 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index 1929103ee59a..b56111ed8a8c 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -684,6 +684,31 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev) } EXPORT_SYMBOL_GPL(vfio_pci_core_disable); +int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar) +{ + struct pci_dev *pdev = vdev->pdev; + void __iomem *io; + int ret; + + if (vdev->barmap[bar]) + return 0; + + ret = pci_request_selected_regions(pdev, 1 << bar, "vfio"); + if (ret) + return ret; + + io = pci_iomap(pdev, bar, 0); + if (!io) { + pci_release_selected_regions(pdev, 1 << bar); + return -ENOMEM; + } + + vdev->barmap[bar] = io; + + return 0; +} +EXPORT_SYMBOL(vfio_pci_core_setup_barmap); Not to endorse the rest of this yet, but minimally _GPL, same for the following patch. Thanks, Alex Sure, will change to EXPORT_SYMBOL_GPL as part of V1. Yishai + void vfio_pci_core_close_device(struct vfio_device *core_vdev) { struct vfio_pci_core_device *vdev = diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c index e27de61ac9fe..6f08b3ecbb89 100644 --- a/drivers/vfio/pci/vfio_pci_rdwr.c +++ b/drivers/vfio/pci/vfio_pci_rdwr.c @@ -200,30 +200,6 @@ static ssize_t do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem, return done; } -static int vfio_pci_setup_barmap(struct vfio_pci_core_device *vdev, int bar) -{ - struct pci_dev *pdev = vdev->pdev; - int ret; - void __iomem *io; - - if (vdev->barmap[bar]) - return 0; - - ret = pci_request_selected_regions(pdev, 1 << bar, "vfio"); - if (ret) - return ret; - - io = pci_iomap(pdev, bar, 0); - if (!io) { - pci_release_selected_regions(pdev, 1 << bar); - return -ENOMEM; - } - - vdev->barmap[bar] = io; - - return 0; -} - ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf, size_t count, loff_t *ppos, bool iswrite) { @@ -262,7 +238,7 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf, } x_end = end; } else { - int ret = vfio_pci_setup_barmap(vdev, bar); + int ret = vfio_pci_core_setup_barmap(vdev, bar); if (ret) { done = ret; goto out; @@ -438,7 +414,7 @@ int vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset, return -EINVAL; #endif - ret = vfio_pci_setup_barmap(vdev, bar); + ret = vfio_pci_core_setup_barmap(vdev, bar); if (ret) return ret; diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index 562e8754869d..67ac58e20e1d 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -127,6 +127,7 @@ int vfio_pci_core_match(struct vfio_device *core_vdev, char *buf); int vfio_pci_core_enable(struct vfio_pci_core_device *vdev); void vfio_pci_core_disable(struct vfio_pci_core_device *vdev); void vfio_pci_core_finish_enable(struct vfio_pci_core_device *vdev); +int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar); pci_ers_result_t vfio_pci_core_aer_err_detected(struct pci_dev *pdev, pci_channel_state_t state); ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
On 21/09/2023 23:34, Michael S. Tsirkin wrote: On Thu, Sep 21, 2023 at 03:40:39PM +0300, Yishai Hadas wrote: Expose admin commands over the virtio device, to be used by the vfio-virtio driver in the next patches. It includes: list query/use, legacy write/read, read notify_info. Signed-off-by: Yishai Hadas --- drivers/vfio/pci/virtio/cmd.c | 146 ++ drivers/vfio/pci/virtio/cmd.h | 27 +++ 2 files changed, 173 insertions(+) create mode 100644 drivers/vfio/pci/virtio/cmd.c create mode 100644 drivers/vfio/pci/virtio/cmd.h diff --git a/drivers/vfio/pci/virtio/cmd.c b/drivers/vfio/pci/virtio/cmd.c new file mode 100644 index ..f068239cdbb0 --- /dev/null +++ b/drivers/vfio/pci/virtio/cmd.c @@ -0,0 +1,146 @@ +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB +/* + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved + */ + +#include "cmd.h" + +int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size) +{ + struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev); + struct scatterlist out_sg; + struct virtio_admin_cmd cmd = {}; + + if (!virtio_dev) + return -ENOTCONN; + + sg_init_one(&out_sg, buf, buf_size); + cmd.opcode = VIRTIO_ADMIN_CMD_LIST_QUERY; + cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV; + cmd.result_sg = &out_sg; + + return virtio_admin_cmd_exec(virtio_dev, &cmd); +} + in/out seem all wrong here. In virtio terminology, in means from device to driver, out means from driver to device. I referred here to in/out from vfio POV who prepares the command. However, I can replace it to follow the virtio terminology as you suggested if this more makes sense. Please see also my coming answer on your suggestion to put all of this in the virtio layer. +int virtiovf_cmd_list_use(struct pci_dev *pdev, u8 *buf, int buf_size) +{ + struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev); + struct scatterlist in_sg; + struct virtio_admin_cmd cmd = {}; + + if (!virtio_dev) + return -ENOTCONN; + + sg_init_one(&in_sg, buf, buf_size); + cmd.opcode = VIRTIO_ADMIN_CMD_LIST_USE; + cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV; + cmd.data_sg = &in_sg; + + return virtio_admin_cmd_exec(virtio_dev, &cmd); +} + +int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode, what is _lr short for? This was an acronym to legacy_read. The actual command is according to the given opcode which can be one among LEGACY_COMMON_CFG_READ, LEGACY_DEV_CFG_READ. I can rename it to '_legacy_read' (i.e. virtiovf_issue_legacy_read_cmd) to be clearer. + u8 offset, u8 size, u8 *buf) +{ + struct virtio_device *virtio_dev = + virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev); + struct virtio_admin_cmd_data_lr_write *in; + struct scatterlist in_sg; + struct virtio_admin_cmd cmd = {}; + int ret; + + if (!virtio_dev) + return -ENOTCONN; + + in = kzalloc(sizeof(*in) + size, GFP_KERNEL); + if (!in) + return -ENOMEM; + + in->offset = offset; + memcpy(in->registers, buf, size); + sg_init_one(&in_sg, in, sizeof(*in) + size); + cmd.opcode = opcode; + cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV; + cmd.group_member_id = virtvdev->vf_id + 1; weird. why + 1? This follows the virtio spec in that area. "When sending commands with the SR-IOV group type, the driver specify a value for group_member_id between 1 and NumVFs inclusive." The 'virtvdev->vf_id' was set upon vfio/virtio driver initialization by pci_iov_vf_id() which its first index is 0. + cmd.data_sg = &in_sg; + ret = virtio_admin_cmd_exec(virtio_dev, &cmd); + + kfree(in); + return ret; +} How do you know it's safe to send this command, in particular at this time? This seems to be doing zero checks, and zero synchronization with the PF driver. The virtiovf_cmd_lr_read()/other gets a virtio VF and it gets its PF by calling virtio_pci_vf_get_pf_dev(). The VF can't gone by 'disable sriov' as it's owned/used by vfio. The PF can't gone by rmmod/modprobe -r of virtio, as of the 'module in use'/dependencies between VFIO to VIRTIO. The below check [1] was done only from a clean code perspective, which might theoretically fail in case the given VF doesn't use a virtio driver. [1] if (!virtio_dev) return -ENOTCONN; So, it looks safe as is. + +int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode, +u8 offset, u8 size, u8 *buf) +{ + struct virtio_device *virtio_dev = + virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev); + struct virtio_admin_cmd_data_lr_read *in; + struct scatterlist in_sg, out_sg; + struct virtio_admin_cmd cmd =
Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
On 22/09/2023 12:54, Michael S. Tsirkin wrote: On Thu, Sep 21, 2023 at 03:40:39PM +0300, Yishai Hadas wrote: Expose admin commands over the virtio device, to be used by the vfio-virtio driver in the next patches. It includes: list query/use, legacy write/read, read notify_info. Signed-off-by: Yishai Hadas This stuff is pure virtio spec. I think it should live under drivers/virtio, too. The motivation to put it in the vfio layer was from the below main reasons: 1) Having it inside virtio may require to export a symbol/function per command. This will end up today by 5 and in the future (e.g. live migration) with much more exported symbols. With current code we export only 2 generic symbols virtio_pci_vf_get_pf_dev(), virtio_admin_cmd_exec() which may fit for any further extension. 2) For now there is no logic in this vfio layer, however, in the future we may have some DMA/other logic that should better fit to the caller/client layer (i.e. vfio). By the way, this follows what was done already between vfio/mlx5 to mlx5_core modules where mlx5_core exposes generic APIs to execute a command and to get the a PF from a given mlx5 VF. This way, we can enable further commands to be added/extended easily/cleanly. See for example here [1, 2]. [1] https://elixir.bootlin.com/linux/v6.6-rc3/source/drivers/vfio/pci/mlx5/cmd.c#L210 [2] https://elixir.bootlin.com/linux/v6.6-rc3/source/drivers/vfio/pci/mlx5/cmd.c#L683 Yishai --- drivers/vfio/pci/virtio/cmd.c | 146 ++ drivers/vfio/pci/virtio/cmd.h | 27 +++ 2 files changed, 173 insertions(+) create mode 100644 drivers/vfio/pci/virtio/cmd.c create mode 100644 drivers/vfio/pci/virtio/cmd.h diff --git a/drivers/vfio/pci/virtio/cmd.c b/drivers/vfio/pci/virtio/cmd.c new file mode 100644 index ..f068239cdbb0 --- /dev/null +++ b/drivers/vfio/pci/virtio/cmd.c @@ -0,0 +1,146 @@ +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB +/* + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved + */ + +#include "cmd.h" + +int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size) +{ + struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev); + struct scatterlist out_sg; + struct virtio_admin_cmd cmd = {}; + + if (!virtio_dev) + return -ENOTCONN; + + sg_init_one(&out_sg, buf, buf_size); + cmd.opcode = VIRTIO_ADMIN_CMD_LIST_QUERY; + cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV; + cmd.result_sg = &out_sg; + + return virtio_admin_cmd_exec(virtio_dev, &cmd); +} + +int virtiovf_cmd_list_use(struct pci_dev *pdev, u8 *buf, int buf_size) +{ + struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev); + struct scatterlist in_sg; + struct virtio_admin_cmd cmd = {}; + + if (!virtio_dev) + return -ENOTCONN; + + sg_init_one(&in_sg, buf, buf_size); + cmd.opcode = VIRTIO_ADMIN_CMD_LIST_USE; + cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV; + cmd.data_sg = &in_sg; + + return virtio_admin_cmd_exec(virtio_dev, &cmd); +} + +int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 opcode, + u8 offset, u8 size, u8 *buf) +{ + struct virtio_device *virtio_dev = + virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev); + struct virtio_admin_cmd_data_lr_write *in; + struct scatterlist in_sg; + struct virtio_admin_cmd cmd = {}; + int ret; + + if (!virtio_dev) + return -ENOTCONN; + + in = kzalloc(sizeof(*in) + size, GFP_KERNEL); + if (!in) + return -ENOMEM; + + in->offset = offset; + memcpy(in->registers, buf, size); + sg_init_one(&in_sg, in, sizeof(*in) + size); + cmd.opcode = opcode; + cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV; + cmd.group_member_id = virtvdev->vf_id + 1; + cmd.data_sg = &in_sg; + ret = virtio_admin_cmd_exec(virtio_dev, &cmd); + + kfree(in); + return ret; +} + +int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode, +u8 offset, u8 size, u8 *buf) +{ + struct virtio_device *virtio_dev = + virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev); + struct virtio_admin_cmd_data_lr_read *in; + struct scatterlist in_sg, out_sg; + struct virtio_admin_cmd cmd = {}; + int ret; + + if (!virtio_dev) + return -ENOTCONN; + + in = kzalloc(sizeof(*in), GFP_KERNEL); + if (!in) + return -ENOMEM; + + in->offset = offset; + sg_init_one(&in_sg, in, sizeof(*in)); + sg_init_one(&out_sg, buf, size); + cmd.opcode = opcode; + cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV; + cmd.data_sg = &in_sg; + cmd.result_sg = &out_sg; + cmd.group_member_id = virtvdev->vf_id
Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
On Tue, Sep 26, 2023 at 01:51:13PM +0300, Yishai Hadas wrote: > On 21/09/2023 23:34, Michael S. Tsirkin wrote: > > On Thu, Sep 21, 2023 at 03:40:39PM +0300, Yishai Hadas wrote: > > > Expose admin commands over the virtio device, to be used by the > > > vfio-virtio driver in the next patches. > > > > > > It includes: list query/use, legacy write/read, read notify_info. > > > > > > Signed-off-by: Yishai Hadas > > > --- > > > drivers/vfio/pci/virtio/cmd.c | 146 ++ > > > drivers/vfio/pci/virtio/cmd.h | 27 +++ > > > 2 files changed, 173 insertions(+) > > > create mode 100644 drivers/vfio/pci/virtio/cmd.c > > > create mode 100644 drivers/vfio/pci/virtio/cmd.h > > > > > > diff --git a/drivers/vfio/pci/virtio/cmd.c b/drivers/vfio/pci/virtio/cmd.c > > > new file mode 100644 > > > index ..f068239cdbb0 > > > --- /dev/null > > > +++ b/drivers/vfio/pci/virtio/cmd.c > > > @@ -0,0 +1,146 @@ > > > +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB > > > +/* > > > + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights > > > reserved > > > + */ > > > + > > > +#include "cmd.h" > > > + > > > +int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size) > > > +{ > > > + struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev); > > > + struct scatterlist out_sg; > > > + struct virtio_admin_cmd cmd = {}; > > > + > > > + if (!virtio_dev) > > > + return -ENOTCONN; > > > + > > > + sg_init_one(&out_sg, buf, buf_size); > > > + cmd.opcode = VIRTIO_ADMIN_CMD_LIST_QUERY; > > > + cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV; > > > + cmd.result_sg = &out_sg; > > > + > > > + return virtio_admin_cmd_exec(virtio_dev, &cmd); > > > +} > > > + > > in/out seem all wrong here. In virtio terminology, in means from > > device to driver, out means from driver to device. > I referred here to in/out from vfio POV who prepares the command. > > However, I can replace it to follow the virtio terminology as you suggested > if this more makes sense. > > Please see also my coming answer on your suggestion to put all of this in > the virtio layer. > > > > +int virtiovf_cmd_list_use(struct pci_dev *pdev, u8 *buf, int buf_size) > > > +{ > > > + struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev); > > > + struct scatterlist in_sg; > > > + struct virtio_admin_cmd cmd = {}; > > > + > > > + if (!virtio_dev) > > > + return -ENOTCONN; > > > + > > > + sg_init_one(&in_sg, buf, buf_size); > > > + cmd.opcode = VIRTIO_ADMIN_CMD_LIST_USE; > > > + cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV; > > > + cmd.data_sg = &in_sg; > > > + > > > + return virtio_admin_cmd_exec(virtio_dev, &cmd); > > > +} > > > + > > > +int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 > > > opcode, > > > > what is _lr short for? > > This was an acronym to legacy_read. > > The actual command is according to the given opcode which can be one among > LEGACY_COMMON_CFG_READ, LEGACY_DEV_CFG_READ. > > I can rename it to '_legacy_read' (i.e. virtiovf_issue_legacy_read_cmd) to > be clearer. > > > > > > + u8 offset, u8 size, u8 *buf) > > > +{ > > > + struct virtio_device *virtio_dev = > > > + virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev); > > > + struct virtio_admin_cmd_data_lr_write *in; > > > + struct scatterlist in_sg; > > > + struct virtio_admin_cmd cmd = {}; > > > + int ret; > > > + > > > + if (!virtio_dev) > > > + return -ENOTCONN; > > > + > > > + in = kzalloc(sizeof(*in) + size, GFP_KERNEL); > > > + if (!in) > > > + return -ENOMEM; > > > + > > > + in->offset = offset; > > > + memcpy(in->registers, buf, size); > > > + sg_init_one(&in_sg, in, sizeof(*in) + size); > > > + cmd.opcode = opcode; > > > + cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV; > > > + cmd.group_member_id = virtvdev->vf_id + 1; > > weird. why + 1? > > This follows the virtio spec in that area. > > "When sending commands with the SR-IOV group type, the driver specify a > value for group_member_id > between 1 and NumVFs inclusive." Ah, I get it. Pls add a comment. > The 'virtvdev->vf_id' was set upon vfio/virtio driver initialization by > pci_iov_vf_id() which its first index is 0. > > > > + cmd.data_sg = &in_sg; > > > + ret = virtio_admin_cmd_exec(virtio_dev, &cmd); > > > + > > > + kfree(in); > > > + return ret; > > > +} > > How do you know it's safe to send this command, in particular at > > this time? This seems to be doing zero checks, and zero synchronization > > with the PF driver. > > > The virtiovf_cmd_lr_read()/other gets a virtio VF and it gets its PF by > calling virtio_pci_vf_get_pf_dev(). > > The VF can't gone by 'disable sriov' as it's owned/used by vfio. > > The PF can't gone by rmmod/modprobe -r of virtio, as of the 'module in > use'/dependencies between VFIO to VIRTIO. > > The below check [1] was done only from a clean code perspective, which might > theoretically fail in case the given VF doesn't us
Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device
On Tue, Sep 26, 2023 at 02:14:01PM +0300, Yishai Hadas wrote: > On 22/09/2023 12:54, Michael S. Tsirkin wrote: > > On Thu, Sep 21, 2023 at 03:40:39PM +0300, Yishai Hadas wrote: > > > Expose admin commands over the virtio device, to be used by the > > > vfio-virtio driver in the next patches. > > > > > > It includes: list query/use, legacy write/read, read notify_info. > > > > > > Signed-off-by: Yishai Hadas > > > > This stuff is pure virtio spec. I think it should live under > > drivers/virtio, too. > > The motivation to put it in the vfio layer was from the below main reasons: > > 1) Having it inside virtio may require to export a symbol/function per > command. > > This will end up today by 5 and in the future (e.g. live migration) with > much more exported symbols. > > With current code we export only 2 generic symbols > virtio_pci_vf_get_pf_dev(), virtio_admin_cmd_exec() which may fit for any > further extension. Except, there's no reasonable way for virtio to know what is done with the device then. You are not using just 2 symbols at all, instead you are using the rich vq API which was explicitly designed for the driver running the device being responsible for serializing accesses. Which is actually loaded and running. And I *think* your use won't conflict ATM mostly by luck. Witness the hack in patch 01 as exhibit 1 - nothing at all even hints at the fact that the reason for the complicated dance is because another driver pokes at some of the vqs. > 2) For now there is no logic in this vfio layer, however, in the future we > may have some DMA/other logic that should better fit to the caller/client > layer (i.e. vfio). You are poking at the device without any locks etc. Maybe it looks like no logic to you but it does not look like that from where I stand. > By the way, this follows what was done already between vfio/mlx5 to > mlx5_core modules where mlx5_core exposes generic APIs to execute a command > and to get the a PF from a given mlx5 VF. This is up to mlx5 maintainers. In particular they only need to worry that their patches work with specific hardware which they likely have. virtio has to work with multiple vendors - hardware and software - and exposing a low level API that I can't test on my laptop is not at all my ideal. > This way, we can enable further commands to be added/extended > easily/cleanly. Something for vfio maintainer to consider in case it was assumed that it's just this one weird thing but otherwise it's all generic vfio. It's not going to stop there, will it? The duplication of functionality with vdpa will continue :( I am much more interested in adding reusable functionality that everyone benefits from than in vfio poking at the device in its own weird ways that only benefit specific hardware. > See for example here [1, 2]. > > [1] > https://elixir.bootlin.com/linux/v6.6-rc3/source/drivers/vfio/pci/mlx5/cmd.c#L210 > > [2] > https://elixir.bootlin.com/linux/v6.6-rc3/source/drivers/vfio/pci/mlx5/cmd.c#L683 > > Yishai > > > > > --- > > > drivers/vfio/pci/virtio/cmd.c | 146 ++ > > > drivers/vfio/pci/virtio/cmd.h | 27 +++ > > > 2 files changed, 173 insertions(+) > > > create mode 100644 drivers/vfio/pci/virtio/cmd.c > > > create mode 100644 drivers/vfio/pci/virtio/cmd.h > > > > > > diff --git a/drivers/vfio/pci/virtio/cmd.c b/drivers/vfio/pci/virtio/cmd.c > > > new file mode 100644 > > > index ..f068239cdbb0 > > > --- /dev/null > > > +++ b/drivers/vfio/pci/virtio/cmd.c > > > @@ -0,0 +1,146 @@ > > > +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB > > > +/* > > > + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights > > > reserved > > > + */ > > > + > > > +#include "cmd.h" > > > + > > > +int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size) > > > +{ > > > + struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev); > > > + struct scatterlist out_sg; > > > + struct virtio_admin_cmd cmd = {}; > > > + > > > + if (!virtio_dev) > > > + return -ENOTCONN; > > > + > > > + sg_init_one(&out_sg, buf, buf_size); > > > + cmd.opcode = VIRTIO_ADMIN_CMD_LIST_QUERY; > > > + cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV; > > > + cmd.result_sg = &out_sg; > > > + > > > + return virtio_admin_cmd_exec(virtio_dev, &cmd); > > > +} > > > + > > > +int virtiovf_cmd_list_use(struct pci_dev *pdev, u8 *buf, int buf_size) > > > +{ > > > + struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev); > > > + struct scatterlist in_sg; > > > + struct virtio_admin_cmd cmd = {}; > > > + > > > + if (!virtio_dev) > > > + return -ENOTCONN; > > > + > > > + sg_init_one(&in_sg, buf, buf_size); > > > + cmd.opcode = VIRTIO_ADMIN_CMD_LIST_USE; > > > + cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV; > > > + cmd.data_sg = &in_sg; > > > + > > > + return virtio_admin_cmd_exec(virtio_dev, &cmd); > > > +} > > > + > > > +int virtiovf_cmd_lr_write(struct virtiovf_pci_core_dev
Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
On Tue, Sep 26, 2023 at 10:32:39AM +0800, Jason Wang wrote: > It's the implementation details in legacy. The device needs to make > sure (reset) the driver can work (is done before get_status return). I think that there's no way to make it reliably work for all legacy drivers. They just assumed a software backend and did not bother with DMA ordering. You can try to avoid resets, they are not that common so things will tend to mostly work if you don't stress them to much with things like hot plug/unplug in a loop. Or you can try to use a driver after 2011 which is more aware of hardware ordering and flushes the reset write with a read. One of these two tricks, I think, is the magic behind the device exposing memory bar 0 that you mention. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 00/16] vdpa: Add support for vq descriptor mappings
On 9/25/2023 12:59 AM, Dragos Tatulea wrote: On Tue, 2023-09-12 at 16:01 +0300, Dragos Tatulea wrote: This patch series adds support for vq descriptor table mappings which are used to improve vdpa live migration downtime. The improvement comes from using smaller mappings which take less time to create and destroy in hw. Gentle ping. Note that I will have to send a v2. The changes in mlx5_ifc.h will need to be merged first separately into the mlx5-next branch [0] and then pulled from there when the series is applied. This separation is unnecessary, as historically the virtio emulation portion of the update to mlx5_ifc.h often had to go through the vhost tree. See commits 1892a3d425bf and e13cd45d352d. Especially the additions from this series (mainly desc group mkey) have nothing to do with any networking or NIC driver feature. -Siwei [0] https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-next Thanks, Dragos The first part adds the vdpa core changes from Si-Wei [0]. The second part adds support in mlx5_vdpa: - Refactor the mr code to be able to cleanly add descriptor mappings. - Add hardware descriptor mr support. - Properly update iotlb for cvq during ASID switch. [0] https://lore.kernel.org/virtualization/1694248959-13369-1-git-send-email-si-wei@oracle.com Dragos Tatulea (13): vdpa/mlx5: Create helper function for dma mappings vdpa/mlx5: Decouple cvq iotlb handling from hw mapping code vdpa/mlx5: Take cvq iotlb lock during refresh vdpa/mlx5: Collapse "dvq" mr add/delete functions vdpa/mlx5: Rename mr destroy functions vdpa/mlx5: Allow creation/deletion of any given mr struct vdpa/mlx5: Move mr mutex out of mr struct vdpa/mlx5: Improve mr update flow vdpa/mlx5: Introduce mr for vq descriptor vdpa/mlx5: Enable hw support for vq descriptor mapping vdpa/mlx5: Make iotlb helper functions more generic vdpa/mlx5: Update cvq iotlb mapping on ASID change Cover letter: vdpa/mlx5: Add support for vq descriptor mappings Si-Wei Liu (3): vdpa: introduce dedicated descriptor group for virtqueue vhost-vdpa: introduce descriptor group backend feature vhost-vdpa: uAPI to get dedicated descriptor group id drivers/vdpa/mlx5/core/mlx5_vdpa.h | 31 +++-- drivers/vdpa/mlx5/core/mr.c | 191 - drivers/vdpa/mlx5/core/resources.c | 6 +- drivers/vdpa/mlx5/net/mlx5_vnet.c | 100 ++- drivers/vhost/vdpa.c | 27 include/linux/mlx5/mlx5_ifc.h | 8 +- include/linux/mlx5/mlx5_ifc_vdpa.h | 7 +- include/linux/vdpa.h | 11 ++ include/uapi/linux/vhost.h | 8 ++ include/uapi/linux/vhost_types.h | 5 + 10 files changed, 264 insertions(+), 130 deletions(-) ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 00/16] vdpa: Add support for vq descriptor mappings
On Tue, 2023-09-26 at 05:22 -0700, Si-Wei Liu wrote: > > > On 9/25/2023 12:59 AM, Dragos Tatulea wrote: > > On Tue, 2023-09-12 at 16:01 +0300, Dragos Tatulea wrote: > > > This patch series adds support for vq descriptor table mappings which > > > are used to improve vdpa live migration downtime. The improvement comes > > > from using smaller mappings which take less time to create and destroy > > > in hw. > > > > > Gentle ping. > > > > Note that I will have to send a v2. The changes in mlx5_ifc.h will need to > > be > > merged first separately into the mlx5-next branch [0] and then pulled from > > there > > when the series is applied. > This separation is unnecessary, as historically the virtio emulation > portion of the update to mlx5_ifc.h often had to go through the vhost > tree. See commits 1892a3d425bf and e13cd45d352d. Especially the > additions from this series (mainly desc group mkey) have nothing to do > with any networking or NIC driver feature. > The reason for doing that is to avoid conflicts in Linus's tree on the mlx5_ifc.h file. Thanks, Dragos > -Siwei > > > > > [0] > > https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-next > > > > Thanks, > > Dragos > > > > > The first part adds the vdpa core changes from Si-Wei [0]. > > > > > > The second part adds support in mlx5_vdpa: > > > - Refactor the mr code to be able to cleanly add descriptor mappings. > > > - Add hardware descriptor mr support. > > > - Properly update iotlb for cvq during ASID switch. > > > > > > [0] > > > https://lore.kernel.org/virtualization/1694248959-13369-1-git-send-email-si-wei@oracle.com > > > > > > Dragos Tatulea (13): > > > vdpa/mlx5: Create helper function for dma mappings > > > vdpa/mlx5: Decouple cvq iotlb handling from hw mapping code > > > vdpa/mlx5: Take cvq iotlb lock during refresh > > > vdpa/mlx5: Collapse "dvq" mr add/delete functions > > > vdpa/mlx5: Rename mr destroy functions > > > vdpa/mlx5: Allow creation/deletion of any given mr struct > > > vdpa/mlx5: Move mr mutex out of mr struct > > > vdpa/mlx5: Improve mr update flow > > > vdpa/mlx5: Introduce mr for vq descriptor > > > vdpa/mlx5: Enable hw support for vq descriptor mapping > > > vdpa/mlx5: Make iotlb helper functions more generic > > > vdpa/mlx5: Update cvq iotlb mapping on ASID change > > > Cover letter: vdpa/mlx5: Add support for vq descriptor mappings > > > > > > Si-Wei Liu (3): > > > vdpa: introduce dedicated descriptor group for virtqueue > > > vhost-vdpa: introduce descriptor group backend feature > > > vhost-vdpa: uAPI to get dedicated descriptor group id > > > > > > drivers/vdpa/mlx5/core/mlx5_vdpa.h | 31 +++-- > > > drivers/vdpa/mlx5/core/mr.c | 191 - > > > drivers/vdpa/mlx5/core/resources.c | 6 +- > > > drivers/vdpa/mlx5/net/mlx5_vnet.c | 100 ++- > > > drivers/vhost/vdpa.c | 27 > > > include/linux/mlx5/mlx5_ifc.h | 8 +- > > > include/linux/mlx5/mlx5_ifc_vdpa.h | 7 +- > > > include/linux/vdpa.h | 11 ++ > > > include/uapi/linux/vhost.h | 8 ++ > > > include/uapi/linux/vhost_types.h | 5 + > > > 10 files changed, 264 insertions(+), 130 deletions(-) > > > > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next v1 01/12] vsock: fix EPOLLERR set on non-empty error queue
On Fri, Sep 22, 2023 at 08:24:17AM +0300, Arseniy Krasnov wrote: If socket's error queue is not empty, EPOLLERR must be set. Otherwise, reader of error queue won't detect data in it using EPOLLERR bit. Currently for AF_VSOCK this is reproducible only with MSG_ZEROCOPY, as this feature is the only user of an error queue of the socket. So this is not really a fix. I'd use a different title to avoid confusion on backporting this on stable branches or not. Maybe just "vsock: set EPOLLERR on non-empty error queue" The change LGTM. Stefano Signed-off-by: Arseniy Krasnov --- net/vmw_vsock/af_vsock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index 013b65241b65..d841f4de33b0 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -1030,7 +1030,7 @@ static __poll_t vsock_poll(struct file *file, struct socket *sock, poll_wait(file, sk_sleep(sk), wait); mask = 0; - if (sk->sk_err) + if (sk->sk_err || !skb_queue_empty_lockless(&sk->sk_error_queue)) /* Signify that there has been an error on this socket. */ mask |= EPOLLERR; -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next v1 02/12] vsock: read from socket's error queue
On Fri, Sep 22, 2023 at 08:24:18AM +0300, Arseniy Krasnov wrote: This adds handling of MSG_ERRQUEUE input flag in receive call. This flag is used to read socket's error queue instead of data queue. Possible scenario of error queue usage is receiving completions for transmission with MSG_ZEROCOPY flag. This patch also adds new defines: 'SOL_VSOCK' and 'VSOCK_RECVERR'. Signed-off-by: Arseniy Krasnov --- Changelog: v5(big patchset) -> v1: * R-b tag removed, due to added defines to 'include/uapi/linux/vsock.h'. Both 'SOL_VSOCK' and 'VSOCK_RECVERR' are needed by userspace, so they were placed to 'include/uapi/linux/vsock.h'. At the same time, the same define for 'SOL_VSOCK' was placed to 'include/linux/socket.h'. This is needed because this file contains SOL_XXX defines for different types of socket, so it prevents situation when another new SOL_XXX will use constant 287. include/linux/socket.h | 1 + include/uapi/linux/vsock.h | 9 + net/vmw_vsock/af_vsock.c | 6 ++ 3 files changed, 16 insertions(+) create mode 100644 include/uapi/linux/vsock.h diff --git a/include/linux/socket.h b/include/linux/socket.h index 39b74d83c7c4..cfcb7e2c3813 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -383,6 +383,7 @@ struct ucred { #define SOL_MPTCP 284 #define SOL_MCTP285 #define SOL_SMC 286 +#define SOL_VSOCK 287 /* IPX options */ #define IPX_TYPE1 diff --git a/include/uapi/linux/vsock.h b/include/uapi/linux/vsock.h new file mode 100644 index ..b25c1347a3b8 --- /dev/null +++ b/include/uapi/linux/vsock.h We already have include/uapi/linux/vm_sockets.h Should we include these changes there instead of creating a new header? @@ -0,0 +1,9 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +#ifndef _UAPI_LINUX_VSOCK_H +#define _UAPI_LINUX_VSOCK_H + +#define SOL_VSOCK 287 Why we need to re-define this also here? In that case, should we protect with some guards to avoid double defines? + +#define VSOCK_RECVERR 1 + +#endif /* _UAPI_LINUX_VSOCK_H */ diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index d841f4de33b0..4fd11bf34bc7 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -110,6 +110,8 @@ #include #include #include +#include +#include static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr); static void vsock_sk_destruct(struct sock *sk); @@ -2137,6 +2139,10 @@ vsock_connectible_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, int err; sk = sock->sk; + + if (unlikely(flags & MSG_ERRQUEUE)) + return sock_recv_errqueue(sk, msg, len, SOL_VSOCK, VSOCK_RECVERR); + vsk = vsock_sk(sk); err = 0; -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next v1 08/12] vsock: enable setting SO_ZEROCOPY
On Fri, Sep 22, 2023 at 08:24:24AM +0300, Arseniy Krasnov wrote: For AF_VSOCK, zerocopy tx mode depends on transport, so this option must be set in AF_VSOCK implementation where transport is accessible (if transport is not set during setting SO_ZEROCOPY: for example socket is not connected, then SO_ZEROCOPY will be enabled, but once transport will be assigned, support of this type of transmission will be checked). To handle SO_ZEROCOPY, AF_VSOCK implementation uses SOCK_CUSTOM_SOCKOPT bit, thus handling SOL_SOCKET option operations, but all of them except SO_ZEROCOPY will be forwarded to the generic handler by calling 'sock_setsockopt()'. Signed-off-by: Arseniy Krasnov --- Changelog: v5(big patchset) -> v1: * Compact 'if' conditions. * Rename 'zc_val' to 'zerocopy'. * Use 'zerocopy' value directly in 'sock_valbool_flag()', without ?: operator. * Set 'SOCK_CUSTOM_SOCKOPT' bit for connectible sockets only, as suggested by Bobby Eshleman . net/vmw_vsock/af_vsock.c | 46 ++-- 1 file changed, 44 insertions(+), 2 deletions(-) diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index 482300eb88e0..c05a42e02a17 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -1406,8 +1406,16 @@ static int vsock_connect(struct socket *sock, struct sockaddr *addr, goto out; } - if (vsock_msgzerocopy_allow(transport)) + if (vsock_msgzerocopy_allow(transport)) { set_bit(SOCK_SUPPORT_ZC, &sk->sk_socket->flags); + } else if (sock_flag(sk, SOCK_ZEROCOPY)) { + /* If this option was set before 'connect()', +* when transport was unknown, check that this +* feature is supported here. +*/ + err = -EOPNOTSUPP; + goto out; + } err = vsock_auto_bind(vsk); if (err) @@ -1643,7 +1651,7 @@ static int vsock_connectible_setsockopt(struct socket *sock, const struct vsock_transport *transport; u64 val; - if (level != AF_VSOCK) + if (level != AF_VSOCK && level != SOL_SOCKET) return -ENOPROTOOPT; #define COPY_IN(_v) \ @@ -1666,6 +1674,34 @@ static int vsock_connectible_setsockopt(struct socket *sock, transport = vsk->transport; + if (level == SOL_SOCKET) { + int zerocopy; + + if (optname != SO_ZEROCOPY) { + release_sock(sk); + return sock_setsockopt(sock, level, optname, optval, optlen); + } + + /* Use 'int' type here, because variable to +* set this option usually has this type. +*/ + COPY_IN(zerocopy); + + if (zerocopy < 0 || zerocopy > 1) { + err = -EINVAL; + goto exit; + } + + if (transport && !vsock_msgzerocopy_allow(transport)) { + err = -EOPNOTSUPP; + goto exit; + } + + sock_valbool_flag(sk, SOCK_ZEROCOPY, + zerocopy); it's not necessary to wrap this call. + goto exit; + } + switch (optname) { case SO_VM_SOCKETS_BUFFER_SIZE: COPY_IN(val); @@ -2322,6 +2358,12 @@ static int vsock_create(struct net *net, struct socket *sock, } } + /* SOCK_DGRAM doesn't have 'setsockopt' callback set in its +* proto_ops, so there is no handler for custom logic. +*/ + if (sock_type_connectible(sock->type)) + set_bit(SOCK_CUSTOM_SOCKOPT, &sk->sk_socket->flags); + vsock_insert_unbound(vsk); return 0; -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next v1 10/12] test/vsock: MSG_ZEROCOPY flag tests
On Fri, Sep 22, 2023 at 08:24:26AM +0300, Arseniy Krasnov wrote: This adds three tests for MSG_ZEROCOPY feature: 1) SOCK_STREAM tx with different buffers. 2) SOCK_SEQPACKET tx with different buffers. 3) SOCK_STREAM test to read empty error queue of the socket. Signed-off-by: Arseniy Krasnov --- tools/testing/vsock/Makefile | 2 +- tools/testing/vsock/util.c| 222 +++ tools/testing/vsock/util.h| 19 ++ tools/testing/vsock/vsock_test.c | 16 ++ tools/testing/vsock/vsock_test_zerocopy.c | 314 ++ tools/testing/vsock/vsock_test_zerocopy.h | 15 ++ 6 files changed, 587 insertions(+), 1 deletion(-) create mode 100644 tools/testing/vsock/vsock_test_zerocopy.c create mode 100644 tools/testing/vsock/vsock_test_zerocopy.h diff --git a/tools/testing/vsock/Makefile b/tools/testing/vsock/Makefile index 21a98ba565ab..1a26f60a596c 100644 --- a/tools/testing/vsock/Makefile +++ b/tools/testing/vsock/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0-only all: test vsock_perf test: vsock_test vsock_diag_test -vsock_test: vsock_test.o timeout.o control.o util.o +vsock_test: vsock_test.o vsock_test_zerocopy.o timeout.o control.o util.o vsock_diag_test: vsock_diag_test.o timeout.o control.o util.o vsock_perf: vsock_perf.o diff --git a/tools/testing/vsock/util.c b/tools/testing/vsock/util.c index 6779d5008b27..d531dbbfa8ff 100644 --- a/tools/testing/vsock/util.c +++ b/tools/testing/vsock/util.c @@ -11,15 +11,27 @@ #include #include #include +#include #include #include #include #include +#include +#include +#include #include "timeout.h" #include "control.h" #include "util.h" +#ifndef SOL_VSOCK +#define SOL_VSOCK 287 +#endif + +#ifndef VSOCK_RECVERR +#define VSOCK_RECVERR 1 +#endif Maybe better to re-define them in util.h where we include vm_socktes.h + /* Install signal handlers */ void init_signals(void) { @@ -444,3 +456,213 @@ unsigned long hash_djb2(const void *data, size_t len) return hash; } + +void enable_so_zerocopy(int fd) +{ + int val = 1; + + if (setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &val, sizeof(val))) { + perror("setsockopt"); + exit(EXIT_FAILURE); + } +} + +static void *mmap_no_fail(size_t bytes) +{ + void *res; + + res = mmap(NULL, bytes, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE, -1, 0); + if (res == MAP_FAILED) { + perror("mmap"); + exit(EXIT_FAILURE); + } + + return res; +} + +size_t iovec_bytes(const struct iovec *iov, size_t iovnum) +{ + size_t bytes; + int i; + + for (bytes = 0, i = 0; i < iovnum; i++) + bytes += iov[i].iov_len; + + return bytes; +} + +static void iovec_random_init(struct iovec *iov, + const struct vsock_test_data *test_data) +{ + int i; + + for (i = 0; i < test_data->vecs_cnt; i++) { + int j; + + if (test_data->vecs[i].iov_base == MAP_FAILED) + continue; + + for (j = 0; j < iov[i].iov_len; j++) + ((uint8_t *)iov[i].iov_base)[j] = rand() & 0xff; + } +} + +unsigned long iovec_hash_djb2(struct iovec *iov, size_t iovnum) +{ + unsigned long hash; + size_t iov_bytes; + size_t offs; + void *tmp; + int i; + + iov_bytes = iovec_bytes(iov, iovnum); + + tmp = malloc(iov_bytes); + if (!tmp) { + perror("malloc"); + exit(EXIT_FAILURE); + } + + for (offs = 0, i = 0; i < iovnum; i++) { + memcpy(tmp + offs, iov[i].iov_base, iov[i].iov_len); + offs += iov[i].iov_len; + } + + hash = hash_djb2(tmp, iov_bytes); + free(tmp); + + return hash; +} + +struct iovec *iovec_from_test_data(const struct vsock_test_data *test_data) +{ + const struct iovec *test_iovec; + struct iovec *iovec; + int i; + + iovec = malloc(sizeof(*iovec) * test_data->vecs_cnt); + if (!iovec) { + perror("malloc"); + exit(EXIT_FAILURE); + } + + test_iovec = test_data->vecs; + + for (i = 0; i < test_data->vecs_cnt; i++) { + iovec[i].iov_len = test_iovec[i].iov_len; + iovec[i].iov_base = mmap_no_fail(test_iovec[i].iov_len); + + if (test_iovec[i].iov_base != MAP_FAILED && + test_iovec[i].iov_base) + iovec[i].iov_base += (uintptr_t)test_iovec[i].iov_base; + } + + /* Unmap "invalid" elements. */ + for (i = 0; i < test_data->vecs_cnt; i++) { + if (test_iovec[i].iov_base == MAP_FAILED) { + if (munmap(iovec[i].iov_base, iovec[i].iov_len)) { + perror("munmap"); + exit(EXIT_FAILURE); + }
Re: [PATCH net-next v1 12/12] test/vsock: io_uring rx/tx tests
On Fri, Sep 22, 2023 at 08:24:28AM +0300, Arseniy Krasnov wrote: This adds set of tests which use io_uring for rx/tx. This test suite is implemented as separated util like 'vsock_test' and has the same set of input arguments as 'vsock_test'. These tests only cover cases of data transmission (no connect/bind/accept etc). Signed-off-by: Arseniy Krasnov --- Changelog: v5(big patchset) -> v1: * Use LDLIBS instead of LDFLAGS. tools/testing/vsock/Makefile | 7 +- tools/testing/vsock/vsock_uring_test.c | 321 + 2 files changed, 327 insertions(+), 1 deletion(-) create mode 100644 tools/testing/vsock/vsock_uring_test.c diff --git a/tools/testing/vsock/Makefile b/tools/testing/vsock/Makefile index 1a26f60a596c..c84380bfc18d 100644 --- a/tools/testing/vsock/Makefile +++ b/tools/testing/vsock/Makefile @@ -1,12 +1,17 @@ # SPDX-License-Identifier: GPL-2.0-only +ifeq ($(MAKECMDGOALS),vsock_uring_test) +LDLIBS = -luring +endif + This will fails if for example we call make with more targets, e.g. `make vsock_test vsock_uring_test`. I'd suggest to use something like this: --- a/tools/testing/vsock/Makefile +++ b/tools/testing/vsock/Makefile @@ -1,13 +1,11 @@ # SPDX-License-Identifier: GPL-2.0-only -ifeq ($(MAKECMDGOALS),vsock_uring_test) -LDLIBS = -luring -endif - all: test vsock_perf test: vsock_test vsock_diag_test vsock_test: vsock_test.o vsock_test_zerocopy.o timeout.o control.o util.o vsock_diag_test: vsock_diag_test.o timeout.o control.o util.o vsock_perf: vsock_perf.o + +vsock_uring_test: LDLIBS = -luring vsock_uring_test: control.o util.o vsock_uring_test.o timeout.o CFLAGS += -g -O2 -Werror -Wall -I. -I../../include -I../../../usr/include -Wno-pointer-sign -fno-strict-overflow -fno-strict-aliasing -fno-common -MMD -U_FORTIFY_SOURCE -D_GNU_SOURCE all: test vsock_perf test: vsock_test vsock_diag_test vsock_test: vsock_test.o vsock_test_zerocopy.o timeout.o control.o util.o vsock_diag_test: vsock_diag_test.o timeout.o control.o util.o vsock_perf: vsock_perf.o +vsock_uring_test: control.o util.o vsock_uring_test.o timeout.o Shoud we add this new test to the "test" target as well? Stefano ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next v1 00/12] vsock/virtio: continue MSG_ZEROCOPY support
Hi Arseniy, On Fri, Sep 22, 2023 at 08:24:16AM +0300, Arseniy Krasnov wrote: Hello, this patchset contains second and third parts of another big patchset for MSG_ZEROCOPY flag support: https://lore.kernel.org/netdev/20230701063947.3422088-1-avkras...@sberdevices.ru/ During review of this series, Stefano Garzarella suggested to split it for three parts to simplify review and merging: 1) virtio and vhost updates (for fragged skbs) (merged to net-next, see link below) 2) AF_VSOCK updates (allows to enable MSG_ZEROCOPY mode and read tx completions) and update for Documentation/. <-- this patchset 3) Updates for tests and utils. <-- this patchset Part 1) was merged: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=71b263e79370348349553ecdf46f4a69eb436dc7 Head for this patchset is: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=71b263e79370348349553ecdf46f4a69eb436dc7 Thanks for the series. I did a quick review highlighting some things that need to be changed. Overall, the series seems to be in good shape. The tests went well. In the next few days I'll see if I can get a better look at the larger patches like the tests, or I'll check in the next version. Thanks, Stefano ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
On 21/09/2023 22:58, Alex Williamson wrote: On Thu, 21 Sep 2023 15:40:40 +0300 Yishai Hadas wrote: Introduce a vfio driver over virtio devices to support the legacy interface functionality for VFs. Background, from the virtio spec [1]. In some systems, there is a need to support a virtio legacy driver with a device that does not directly support the legacy interface. In such scenarios, a group owner device can provide the legacy interface functionality for the group member devices. The driver of the owner device can then access the legacy interface of a member device on behalf of the legacy member device driver. For example, with the SR-IOV group type, group members (VFs) can not present the legacy interface in an I/O BAR in BAR0 as expected by the legacy pci driver. If the legacy driver is running inside a virtual machine, the hypervisor executing the virtual machine can present a virtual device with an I/O BAR in BAR0. The hypervisor intercepts the legacy driver accesses to this I/O BAR and forwards them to the group owner device (PF) using group administration commands. Specifically, this driver adds support for a virtio-net VF to be exposed as a transitional device to a guest driver and allows the legacy IO BAR functionality on top. This allows a VM which uses a legacy virtio-net driver in the guest to work transparently over a VF which its driver in the host is that new driver. The driver can be extended easily to support some other types of virtio devices (e.g virtio-blk), by adding in a few places the specific type properties as was done for virtio-net. For now, only the virtio-net use case was tested and as such we introduce the support only for such a device. Practically, Upon probing a VF for a virtio-net device, in case its PF supports legacy access over the virtio admin commands and the VF doesn't have BAR 0, we set some specific 'vfio_device_ops' to be able to simulate in SW a transitional device with I/O BAR in BAR 0. The existence of the simulated I/O bar is reported later on by overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device exposes itself as a transitional device by overwriting some properties upon reading its config space. Once we report the existence of I/O BAR as BAR 0 a legacy driver in the guest may use it via read/write calls according to the virtio specification. Any read/write towards the control parts of the BAR will be captured by the new driver and will be translated into admin commands towards the device. Any data path read/write access (i.e. virtio driver notifications) will be forwarded to the physical BAR which its properties were supplied by the command VIRTIO_PCI_QUEUE_NOTIFY upon the probing/init flow. With that code in place a legacy driver in the guest has the look and feel as if having a transitional device with legacy support for both its control and data path flows. [1] https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c Signed-off-by: Yishai Hadas --- MAINTAINERS | 6 + drivers/vfio/pci/Kconfig | 2 + drivers/vfio/pci/Makefile| 2 + drivers/vfio/pci/virtio/Kconfig | 15 + drivers/vfio/pci/virtio/Makefile | 4 + drivers/vfio/pci/virtio/cmd.c| 4 +- drivers/vfio/pci/virtio/cmd.h| 8 + drivers/vfio/pci/virtio/main.c | 546 +++ 8 files changed, 585 insertions(+), 2 deletions(-) create mode 100644 drivers/vfio/pci/virtio/Kconfig create mode 100644 drivers/vfio/pci/virtio/Makefile create mode 100644 drivers/vfio/pci/virtio/main.c diff --git a/MAINTAINERS b/MAINTAINERS index bf0f54c24f81..5098418c8389 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -22624,6 +22624,12 @@ L: k...@vger.kernel.org S:Maintained F:drivers/vfio/pci/mlx5/ +VFIO VIRTIO PCI DRIVER +M: Yishai Hadas +L: k...@vger.kernel.org +S: Maintained +F: drivers/vfio/pci/virtio + VFIO PCI DEVICE SPECIFIC DRIVERS R:Jason Gunthorpe R:Yishai Hadas diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig index 8125e5f37832..18c397df566d 100644 --- a/drivers/vfio/pci/Kconfig +++ b/drivers/vfio/pci/Kconfig @@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig" source "drivers/vfio/pci/pds/Kconfig" +source "drivers/vfio/pci/virtio/Kconfig" + endmenu diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile index 45167be462d8..046139a4eca5 100644 --- a/drivers/vfio/pci/Makefile +++ b/drivers/vfio/pci/Makefile @@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI) += mlx5/ obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/ obj-$(CONFIG_PDS_VFIO_PCI) += pds/ + +obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/ diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig new file mode 100644 index ..8
Re: [PATCH vhost v13 12/12] virtio_net: merge dma operations when filling mergeable buffers
On Thu, Aug 10, 2023 at 08:30:57PM +0800, Xuan Zhuo wrote: > Currently, the virtio core will perform a dma operation for each > buffer. Although, the same page may be operated multiple times. > > This patch, the driver does the dma operation and manages the dma > address based the feature premapped of virtio core. > > This way, we can perform only one dma operation for the pages of the > alloc frag. This is beneficial for the iommu device. > > kernel command line: intel_iommu=on iommu.passthrough=0 > >| strict=0 | strict=1 > Before | 775496pps | 428614pps > After | 1109316pps | 742853pps > > Signed-off-by: Xuan Zhuo Hi Xuan Zhuo, looks like this patch is causing regressions. Do you have time to debug or should I revert? > --- > drivers/net/virtio_net.c | 228 ++- > 1 file changed, 202 insertions(+), 26 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 486b5849033d..16adb5ef18f8 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -126,6 +126,14 @@ static const struct virtnet_stat_desc > virtnet_rq_stats_desc[] = { > #define VIRTNET_SQ_STATS_LEN ARRAY_SIZE(virtnet_sq_stats_desc) > #define VIRTNET_RQ_STATS_LEN ARRAY_SIZE(virtnet_rq_stats_desc) > > +/* The dma information of pages allocated at a time. */ > +struct virtnet_rq_dma { > + dma_addr_t addr; > + u32 ref; > + u16 len; > + u16 need_sync; > +}; > + > /* Internal representation of a send virtqueue */ > struct send_queue { > /* Virtqueue associated with this send _queue */ > @@ -175,6 +183,12 @@ struct receive_queue { > char name[16]; > > struct xdp_rxq_info xdp_rxq; > + > + /* Record the last dma info to free after new pages is allocated. */ > + struct virtnet_rq_dma *last_dma; > + > + /* Do dma by self */ > + bool do_dma; > }; > > /* This structure can contain rss message with maximum settings for > indirection table and keysize > @@ -549,6 +563,156 @@ static struct sk_buff *page_to_skb(struct virtnet_info > *vi, > return skb; > } > > +static void virtnet_rq_unmap(struct receive_queue *rq, void *buf, u32 len) > +{ > + struct page *page = virt_to_head_page(buf); > + struct virtnet_rq_dma *dma; > + void *head; > + int offset; > + > + head = page_address(page); > + > + dma = head; > + > + --dma->ref; > + > + if (dma->ref) { > + if (dma->need_sync && len) { > + offset = buf - (head + sizeof(*dma)); > + > + virtqueue_dma_sync_single_range_for_cpu(rq->vq, > dma->addr, offset, > + len, > DMA_FROM_DEVICE); > + } > + > + return; > + } > + > + virtqueue_dma_unmap_single_attrs(rq->vq, dma->addr, dma->len, > + DMA_FROM_DEVICE, > DMA_ATTR_SKIP_CPU_SYNC); > + put_page(page); > +} > + > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void > **ctx) > +{ > + void *buf; > + > + buf = virtqueue_get_buf_ctx(rq->vq, len, ctx); > + if (buf && rq->do_dma) > + virtnet_rq_unmap(rq, buf, *len); > + > + return buf; > +} > + > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq) > +{ > + void *buf; > + > + buf = virtqueue_detach_unused_buf(rq->vq); > + if (buf && rq->do_dma) > + virtnet_rq_unmap(rq, buf, 0); > + > + return buf; > +} > + > +static void virtnet_rq_init_one_sg(struct receive_queue *rq, void *buf, u32 > len) > +{ > + struct virtnet_rq_dma *dma; > + dma_addr_t addr; > + u32 offset; > + void *head; > + > + if (!rq->do_dma) { > + sg_init_one(rq->sg, buf, len); > + return; > + } > + > + head = page_address(rq->alloc_frag.page); > + > + offset = buf - head; > + > + dma = head; > + > + addr = dma->addr - sizeof(*dma) + offset; > + > + sg_init_table(rq->sg, 1); > + rq->sg[0].dma_address = addr; > + rq->sg[0].length = len; > +} > + > +static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp) > +{ > + struct page_frag *alloc_frag = &rq->alloc_frag; > + struct virtnet_rq_dma *dma; > + void *buf, *head; > + dma_addr_t addr; > + > + if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp))) > + return NULL; > + > + head = page_address(alloc_frag->page); > + > + if (rq->do_dma) { > + dma = head; > + > + /* new pages */ > + if (!alloc_frag->offset) { > + if (rq->last_dma) { > + /* Now, the new page is allocated, the last dma > + * will not be used. So the dma can be unmapped > + * if the ref is 0. > + */ > + virtnet_rq_unmap(rq, rq->last_dma, 0); > +
Re: [PATCH V3] io_uring: fix IO hang in io_wq_put_and_exit from do_exit()
On Tue, Sep 26, 2023 at 09:28:15AM +0800, Ming Lei wrote: > On Mon, Sep 25, 2023 at 05:17:10PM -0400, Stefan Hajnoczi wrote: > > On Fri, Sep 15, 2023 at 03:04:05PM +0800, Jason Wang wrote: > > > On Fri, Sep 8, 2023 at 11:25 PM Ming Lei wrote: > > > > > > > > On Fri, Sep 08, 2023 at 08:44:45AM -0600, Jens Axboe wrote: > > > > > On 9/8/23 8:34 AM, Ming Lei wrote: > > > > > > On Fri, Sep 08, 2023 at 07:49:53AM -0600, Jens Axboe wrote: > > > > > >> On 9/8/23 3:30 AM, Ming Lei wrote: > > > > > >>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c > > > > > >>> index ad636954abae..95a3d31a1ef1 100644 > > > > > >>> --- a/io_uring/io_uring.c > > > > > >>> +++ b/io_uring/io_uring.c > > > > > >>> @@ -1930,6 +1930,10 @@ void io_wq_submit_work(struct io_wq_work > > > > > >>> *work) > > > > > >>> } > > > > > >>> } > > > > > >>> > > > > > >>> + /* It is fragile to block POLLED IO, so switch to NON_BLOCK */ > > > > > >>> + if ((req->ctx->flags & IORING_SETUP_IOPOLL) && > > > > > >>> def->iopoll_queue) > > > > > >>> + issue_flags |= IO_URING_F_NONBLOCK; > > > > > >>> + > > > > > >> > > > > > >> I think this comment deserves to be more descriptive. Normally we > > > > > >> absolutely cannot block for polled IO, it's only OK here because > > > > > >> io-wq > > > > > > > > > > > > Yeah, we don't do that until commit 2bc057692599 ("block: don't > > > > > > make REQ_POLLED > > > > > > imply REQ_NOWAIT") which actually push the responsibility/risk up to > > > > > > io_uring. > > > > > > > > > > > >> is the issuer and not necessarily the poller of it. That generally > > > > > >> falls > > > > > >> upon the original issuer to poll these requests. > > > > > >> > > > > > >> I think this should be a separate commit, coming before the main > > > > > >> fix > > > > > >> which is below. > > > > > > > > > > > > Looks fine, actually IO_URING_F_NONBLOCK change isn't a must, and > > > > > > the > > > > > > approach in V2 doesn't need this change. > > > > > > > > > > > >> > > > > > >>> @@ -3363,6 +3367,12 @@ __cold void io_uring_cancel_generic(bool > > > > > >>> cancel_all, struct io_sq_data *sqd) > > > > > >>> finish_wait(&tctx->wait, &wait); > > > > > >>> } while (1); > > > > > >>> > > > > > >>> + /* > > > > > >>> + * Reap events from each ctx, otherwise these requests may take > > > > > >>> + * resources and prevent other contexts from being moved on. > > > > > >>> + */ > > > > > >>> + xa_for_each(&tctx->xa, index, node) > > > > > >>> + io_iopoll_try_reap_events(node->ctx); > > > > > >> > > > > > >> The main issue here is that if someone isn't polling for them, > > > > > >> then we > > > > > > > > > > > > That is actually what this patch is addressing, :-) > > > > > > > > > > Right, that part is obvious :) > > > > > > > > > > >> get to wait for a timeout before they complete. This can delay > > > > > >> exit, for > > > > > >> example, as we're now just waiting 30 seconds (or whatever the > > > > > >> timeout > > > > > >> is on the underlying device) for them to get timed out before exit > > > > > >> can > > > > > >> finish. > > > > > > > > > > > > For the issue on null_blk, device timeout handler provides > > > > > > forward-progress, such as requests are released, so new IO can be > > > > > > handled. > > > > > > > > > > > > However, not all devices support timeout, such as virtio device. > > > > > > > > > > That's a bug in the driver, you cannot sanely support polled IO and > > > > > not > > > > > be able to deal with timeouts. Someone HAS to reap the requests and > > > > > there are only two things that can do that - the application doing the > > > > > polled IO, or if that doesn't happen, a timeout. > > > > > > > > OK, then device driver timeout handler has new responsibility of > > > > covering > > > > userspace accident, :-) > > > > Sorry, I don't have enough context so this is probably a silly question: > > > > When an application doesn't reap a polled request, why doesn't the block > > layer take care of this in a generic way? I don't see anything > > driver-specific about this. > > block layer doesn't have knowledge to handle that, io_uring knows the > application is exiting, and can help to reap the events. I thought the discussion was about I/O timeouts in general but here you're only mentioning application exit. Are we talking about I/O timeouts or purely about cleaning up I/O requests when an application exits? > > But the big question is that if there is really IO timeout for virtio-blk. > If there is, the reap done in io_uring may never return and cause other > issue, so if it is done in io_uring, that can be just thought as sort of > improvement. virtio-blk drivers have no way of specifying timeouts on the device or aborting/canceling requests. virtio-blk devices may fail requests if they implement an internal timeout mechanism (e.g. the host kernel fails requests after a host timeout), but this is not controlled by the driver and there is no guarantee
Re: [PATCH] crypto: virtio-crypto: call finalize with bh disabled
[..] > --- a/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c > +++ b/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c > @@ -61,8 +61,9 @@ static void virtio_crypto_akcipher_finalize_req( > vc_akcipher_req->src_buf = NULL; > vc_akcipher_req->dst_buf = NULL; > virtcrypto_clear_request(&vc_akcipher_req->base); > - > + local_bh_disable(); > crypto_finalize_akcipher_request(vc_akcipher_req->base.dataq->engine, > req, err); > + local_bh_enable(); Thanks Gonglei! I did this a quick spin, and it does not seem to be sufficient on s390x. Which does not come as a surprise to me, because #define lockdep_assert_in_softirq() \ do {\ WARN_ON_ONCE(__lockdep_enabled && \ (!in_softirq() || in_irq() || in_nmi())); \ } while (0) will still warn because in_irq() still evaluates to true (your patch addresses the !in_softirq() part). I don't have any results on x86 yet. My current understanding is that the virtio-pci transport code disables interrupts locally somewhere in the call chain (actually in vp_vring_interrupt() via spin_lock_irqsave()) and then x86 would be fine. But I will get that verified. On the other hand virtio_airq_handler() calls vring_interrupt() with interrupts enabled. (While vring_interrupt() is called in a (read) critical section in virtio_airq_handler() we use read_lock() and not read_lock_irqsave() to grab the lock. Whether that is correct in it self (i.e. disregarding the crypto problem) or not I'm not sure right now. Will think some more about it tomorrow.) If the way to go forward is disabling interrupts in virtio-ccw before vring_interrupt() is called, I would be glad to spin a patch for that. Copying Conny, as she may have an opinion on this (if I'm not wrong she authored that code). Regards, Halil ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices
On Tue, Sep 26, 2023 at 06:20:45PM +0300, Yishai Hadas wrote: > On 21/09/2023 22:58, Alex Williamson wrote: > > On Thu, 21 Sep 2023 15:40:40 +0300 > > Yishai Hadas wrote: > > > > > Introduce a vfio driver over virtio devices to support the legacy > > > interface functionality for VFs. > > > > > > Background, from the virtio spec [1]. > > > > > > In some systems, there is a need to support a virtio legacy driver with > > > a device that does not directly support the legacy interface. In such > > > scenarios, a group owner device can provide the legacy interface > > > functionality for the group member devices. The driver of the owner > > > device can then access the legacy interface of a member device on behalf > > > of the legacy member device driver. > > > > > > For example, with the SR-IOV group type, group members (VFs) can not > > > present the legacy interface in an I/O BAR in BAR0 as expected by the > > > legacy pci driver. If the legacy driver is running inside a virtual > > > machine, the hypervisor executing the virtual machine can present a > > > virtual device with an I/O BAR in BAR0. The hypervisor intercepts the > > > legacy driver accesses to this I/O BAR and forwards them to the group > > > owner device (PF) using group administration commands. > > > > > > > > > Specifically, this driver adds support for a virtio-net VF to be exposed > > > as a transitional device to a guest driver and allows the legacy IO BAR > > > functionality on top. > > > > > > This allows a VM which uses a legacy virtio-net driver in the guest to > > > work transparently over a VF which its driver in the host is that new > > > driver. > > > > > > The driver can be extended easily to support some other types of virtio > > > devices (e.g virtio-blk), by adding in a few places the specific type > > > properties as was done for virtio-net. > > > > > > For now, only the virtio-net use case was tested and as such we introduce > > > the support only for such a device. > > > > > > Practically, > > > Upon probing a VF for a virtio-net device, in case its PF supports > > > legacy access over the virtio admin commands and the VF doesn't have BAR > > > 0, we set some specific 'vfio_device_ops' to be able to simulate in SW a > > > transitional device with I/O BAR in BAR 0. > > > > > > The existence of the simulated I/O bar is reported later on by > > > overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device > > > exposes itself as a transitional device by overwriting some properties > > > upon reading its config space. > > > > > > Once we report the existence of I/O BAR as BAR 0 a legacy driver in the > > > guest may use it via read/write calls according to the virtio > > > specification. > > > > > > Any read/write towards the control parts of the BAR will be captured by > > > the new driver and will be translated into admin commands towards the > > > device. > > > > > > Any data path read/write access (i.e. virtio driver notifications) will > > > be forwarded to the physical BAR which its properties were supplied by > > > the command VIRTIO_PCI_QUEUE_NOTIFY upon the probing/init flow. > > > > > > With that code in place a legacy driver in the guest has the look and > > > feel as if having a transitional device with legacy support for both its > > > control and data path flows. > > > > > > [1] > > > https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c > > > > > > Signed-off-by: Yishai Hadas > > > --- > > > MAINTAINERS | 6 + > > > drivers/vfio/pci/Kconfig | 2 + > > > drivers/vfio/pci/Makefile| 2 + > > > drivers/vfio/pci/virtio/Kconfig | 15 + > > > drivers/vfio/pci/virtio/Makefile | 4 + > > > drivers/vfio/pci/virtio/cmd.c| 4 +- > > > drivers/vfio/pci/virtio/cmd.h| 8 + > > > drivers/vfio/pci/virtio/main.c | 546 +++ > > > 8 files changed, 585 insertions(+), 2 deletions(-) > > > create mode 100644 drivers/vfio/pci/virtio/Kconfig > > > create mode 100644 drivers/vfio/pci/virtio/Makefile > > > create mode 100644 drivers/vfio/pci/virtio/main.c > > > > > > diff --git a/MAINTAINERS b/MAINTAINERS > > > index bf0f54c24f81..5098418c8389 100644 > > > --- a/MAINTAINERS > > > +++ b/MAINTAINERS > > > @@ -22624,6 +22624,12 @@ L: k...@vger.kernel.org > > > S: Maintained > > > F: drivers/vfio/pci/mlx5/ > > > +VFIO VIRTIO PCI DRIVER > > > +M: Yishai Hadas > > > +L: k...@vger.kernel.org > > > +S: Maintained > > > +F: drivers/vfio/pci/virtio > > > + > > > VFIO PCI DEVICE SPECIFIC DRIVERS > > > R: Jason Gunthorpe > > > R: Yishai Hadas > > > diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig > > > index 8125e5f37832..18c397df566d 100644 > > > --- a/drivers/vfio/pci/Kconf
Re: [PATCH] crypto: virtio-crypto: call finalize with bh disabled
On Tue, Sep 26, 2023 at 06:41:58PM +0200, Halil Pasic wrote: > [..] > > --- a/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c > > +++ b/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c > > @@ -61,8 +61,9 @@ static void virtio_crypto_akcipher_finalize_req( > > vc_akcipher_req->src_buf = NULL; > > vc_akcipher_req->dst_buf = NULL; > > virtcrypto_clear_request(&vc_akcipher_req->base); > > - > > + local_bh_disable(); > > crypto_finalize_akcipher_request(vc_akcipher_req->base.dataq->engine, > > req, err); > > + local_bh_enable(); > > Thanks Gonglei! > > I did this a quick spin, and it does not seem to be sufficient on s390x. > Which does not come as a surprise to me, because > > #define lockdep_assert_in_softirq() \ > do {\ > WARN_ON_ONCE(__lockdep_enabled && \ > (!in_softirq() || in_irq() || in_nmi())); \ > } while (0) > > will still warn because in_irq() still evaluates to true (your patch > addresses the !in_softirq() part). > > I don't have any results on x86 yet. My current understanding is that the > virtio-pci transport code disables interrupts locally somewhere in the > call chain (actually in vp_vring_interrupt() via spin_lock_irqsave()) > and then x86 would be fine. But I will get that verified. > > On the other hand virtio_airq_handler() calls vring_interrupt() with > interrupts enabled. (While vring_interrupt() is called in a (read) > critical section in virtio_airq_handler() we use read_lock() and > not read_lock_irqsave() to grab the lock. Whether that is correct in > it self (i.e. disregarding the crypto problem) or not I'm not sure right > now. Will think some more about it tomorrow.) If the way to go forward > is disabling interrupts in virtio-ccw before vring_interrupt() is > called, I would be glad to spin a patch for that. > > Copying Conny, as she may have an opinion on this (if I'm not wrong she > authored that code). > > Regards, > Halil On a related note, config change callback is also handled incorrectly in this driver, it takes a mutex from interrupt context. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH vfio 01/11] virtio-pci: Use virtio pci device layer vq info instead of generic one
On 2023-09-21 a.m.9:46, Michael S. Tsirkin wrote: External email: Use caution opening links or attachments On Thu, Sep 21, 2023 at 03:40:30PM +0300, Yishai Hadas wrote: From: Feng Liu Currently VQ deletion callback vp_del_vqs() processes generic virtio_device level VQ list instead of VQ information available at PCI layer. To adhere to the layering, use the pci device level VQ information stored in the virtqueues or vqs. This also prepares the code to handle PCI layer admin vq life cycle to be managed within the pci layer and thereby avoid undesired deletion of admin vq by upper layer drivers (net, console, vfio), in the del_vqs() callback. Signed-off-by: Feng Liu Reviewed-by: Parav Pandit Reviewed-by: Jiri Pirko Signed-off-by: Yishai Hadas --- drivers/virtio/virtio_pci_common.c | 12 +--- drivers/virtio/virtio_pci_common.h | 1 + 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c index c2524a7207cf..7a3e6edc4dd6 100644 --- a/drivers/virtio/virtio_pci_common.c +++ b/drivers/virtio/virtio_pci_common.c @@ -232,12 +232,16 @@ static void vp_del_vq(struct virtqueue *vq) void vp_del_vqs(struct virtio_device *vdev) { struct virtio_pci_device *vp_dev = to_vp_device(vdev); - struct virtqueue *vq, *n; + struct virtqueue *vq; int i; - list_for_each_entry_safe(vq, n, &vdev->vqs, list) { + for (i = 0; i < vp_dev->nvqs; i++) { + if (!vp_dev->vqs[i]) + continue; + + vq = vp_dev->vqs[i]->vq; if (vp_dev->per_vq_vectors) { - int v = vp_dev->vqs[vq->index]->msix_vector; + int v = vp_dev->vqs[i]->msix_vector; if (v != VIRTIO_MSI_NO_VECTOR) { int irq = pci_irq_vector(vp_dev->pci_dev, v); @@ -294,6 +298,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, unsigned int nvqs, vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL); if (!vp_dev->vqs) return -ENOMEM; + vp_dev->nvqs = nvqs; if (per_vq_vectors) { /* Best option: one for change interrupt, one per vq. */ @@ -365,6 +370,7 @@ static int vp_find_vqs_intx(struct virtio_device *vdev, unsigned int nvqs, vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL); if (!vp_dev->vqs) return -ENOMEM; + vp_dev->nvqs = nvqs; err = request_irq(vp_dev->pci_dev->irq, vp_interrupt, IRQF_SHARED, dev_name(&vdev->dev), vp_dev); diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h index 4b773bd7c58c..602021967aaa 100644 --- a/drivers/virtio/virtio_pci_common.h +++ b/drivers/virtio/virtio_pci_common.h @@ -60,6 +60,7 @@ struct virtio_pci_device { /* array of all queues for house-keeping */ struct virtio_pci_vq_info **vqs; + u32 nvqs; I don't much like it that we are adding more duplicated info here. In fact, we tried removing the vqs array in 5c34d002dcc7a6dd665a19d098b4f4cd5501ba1a - there was some bug in that patch and the author didn't have the time to debug so I reverted but I don't really think we need to add to that. Hi Michael As explained in commit message, this patch is mainly to prepare for the subsequent admin vq patches. The admin vq is also established using the common mechanism of vring, and is added to vdev->vqs in __vring_new_virtqueue(). So vdev->vqs contains all virtqueues, including rxq, txq, ctrlvq and admin vq. admin vq should be managed by the virito_pci layer and should not be created or deleted by upper driver (net, blk); When the upper driver was unloaded, it will call del_vqs() interface, which wll call vp_del_vqs(), and vp_del_vqs() should not delete the admin vq, but only delete the virtqueues created by the upper driver such as rxq, txq, and ctrlq. vp_dev->vqs[] array only contains virtqueues created by upper driver such as rxq, txq, ctrlq. Traversing vp_dev->vqs array can only delete the upper virtqueues, without the admin vq. Use the vdev->vqs linked list cannot meet the needs. Can such an explanation be explained clearly? Or do you have any other alternative methods? /* MSI-X support */ int msix_enabled; -- 2.27.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH vfio 03/11] virtio-pci: Introduce admin virtqueue
On 2023-09-21 a.m.9:57, Michael S. Tsirkin wrote: External email: Use caution opening links or attachments On Thu, Sep 21, 2023 at 03:40:32PM +0300, Yishai Hadas wrote: From: Feng Liu Introduce support for the admin virtqueue. By negotiating VIRTIO_F_ADMIN_VQ feature, driver detects capability and creates one administration virtqueue. Administration virtqueue implementation in virtio pci generic layer, enables multiple types of upper layer drivers such as vfio, net, blk to utilize it. Signed-off-by: Feng Liu Reviewed-by: Parav Pandit Reviewed-by: Jiri Pirko Signed-off-by: Yishai Hadas --- drivers/virtio/Makefile| 2 +- drivers/virtio/virtio.c| 37 +-- drivers/virtio/virtio_pci_common.h | 15 +- drivers/virtio/virtio_pci_modern.c | 10 +++- drivers/virtio/virtio_pci_modern_avq.c | 65 ++ if you have a .c file without a .h file you know there's something fishy. Just add this inside drivers/virtio/virtio_pci_modern.c ? Will do. include/linux/virtio_config.h | 4 ++ include/linux/virtio_pci_modern.h | 3 ++ 7 files changed, 129 insertions(+), 7 deletions(-) create mode 100644 drivers/virtio/virtio_pci_modern_avq.c diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile index 8e98d24917cc..dcc535b5b4d9 100644 --- a/drivers/virtio/Makefile +++ b/drivers/virtio/Makefile @@ -5,7 +5,7 @@ obj-$(CONFIG_VIRTIO_PCI_LIB) += virtio_pci_modern_dev.o obj-$(CONFIG_VIRTIO_PCI_LIB_LEGACY) += virtio_pci_legacy_dev.o obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o -virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o +virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o virtio_pci_modern_avq.o virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c index 3893dc29eb26..f4080692b351 100644 --- a/drivers/virtio/virtio.c +++ b/drivers/virtio/virtio.c @@ -302,9 +302,15 @@ static int virtio_dev_probe(struct device *_d) if (err) goto err; + if (dev->config->create_avq) { + err = dev->config->create_avq(dev); + if (err) + goto err; + } + err = drv->probe(dev); if (err) - goto err; + goto err_probe; /* If probe didn't do it, mark device DRIVER_OK ourselves. */ if (!(dev->config->get_status(dev) & VIRTIO_CONFIG_S_DRIVER_OK)) @@ -316,6 +322,10 @@ static int virtio_dev_probe(struct device *_d) virtio_config_enable(dev); return 0; + +err_probe: + if (dev->config->destroy_avq) + dev->config->destroy_avq(dev); err: virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED); return err; @@ -331,6 +341,9 @@ static void virtio_dev_remove(struct device *_d) drv->remove(dev); + if (dev->config->destroy_avq) + dev->config->destroy_avq(dev); + /* Driver should have reset device. */ WARN_ON_ONCE(dev->config->get_status(dev)); @@ -489,13 +502,20 @@ EXPORT_SYMBOL_GPL(unregister_virtio_device); int virtio_device_freeze(struct virtio_device *dev) { struct virtio_driver *drv = drv_to_virtio(dev->dev.driver); + int ret; virtio_config_disable(dev); dev->failed = dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED; - if (drv && drv->freeze) - return drv->freeze(dev); + if (drv && drv->freeze) { + ret = drv->freeze(dev); + if (ret) + return ret; + } + + if (dev->config->destroy_avq) + dev->config->destroy_avq(dev); return 0; } @@ -532,10 +552,16 @@ int virtio_device_restore(struct virtio_device *dev) if (ret) goto err; + if (dev->config->create_avq) { + ret = dev->config->create_avq(dev); + if (ret) + goto err; + } + if (drv->restore) { ret = drv->restore(dev); if (ret) - goto err; + goto err_restore; } /* If restore didn't do it, mark device DRIVER_OK ourselves. */ @@ -546,6 +572,9 @@ int virtio_device_restore(struct virtio_device *dev) return 0; +err_restore: + if (dev->config->destroy_avq) + dev->config->destroy_avq(dev); err: virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED); return ret; diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h index 602021967aaa..9bffa95274b6 100644 --- a/drivers/virtio/virtio_pci_common.h +++ b/drivers/virtio/virtio_pci_common.h @@ -41,6 +41,14 @@ struct virtio_pci_vq_info { unsigned int msix_vector; }; +struct virtio_avq { admin_vq would be better. and this is pci specific yes? so virtio_pci_ Will do. +
Re: [GIT PULL] virtio: features
On Tue, 26 Sep 2023 08:04:51 -0500, Michael Roth wrote: > On Sun, Sep 03, 2023 at 06:13:38PM -0400, Michael S. Tsirkin wrote: > > The following changes since commit 2dde18cd1d8fac735875f2e4987f11817cc0bc2c: > > > > Linux 6.5 (2023-08-27 14:49:51 -0700) > > > > are available in the Git repository at: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git > > tags/for_linus > > > > for you to fetch changes up to 1acfe2c1225899eab5ab724c91b7e1eb2881b9ab: > > > > virtio_ring: fix avail_wrap_counter in virtqueue_add_packed (2023-09-03 > > 18:10:24 -0400) > > > > > > virtio: features > > > > a small pull request this time around, mostly because the > > vduse network got postponed to next relase so we can be sure > > we got the security store right. > > > > Signed-off-by: Michael S. Tsirkin > > > > > > Eugenio P閞ez (4): > > vdpa: add VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK flag > > vdpa: accept VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK backend feature > > vdpa: add get_backend_features vdpa operation > > vdpa_sim: offer VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK > > > > Jason Wang (1): > > virtio_vdpa: build affinity masks conditionally > > > > Xuan Zhuo (12): > > virtio_ring: check use_dma_api before unmap desc for indirect > > virtio_ring: put mapping error check in vring_map_one_sg > > virtio_ring: introduce virtqueue_set_dma_premapped() > > virtio_ring: support add premapped buf > > virtio_ring: introduce virtqueue_dma_dev() > > virtio_ring: skip unmap for premapped > > virtio_ring: correct the expression of the description of > > virtqueue_resize() > > virtio_ring: separate the logic of reset/enable from virtqueue_resize > > virtio_ring: introduce virtqueue_reset() > > virtio_ring: introduce dma map api for virtqueue > > virtio_ring: introduce dma sync api for virtqueue > > virtio_net: merge dma operations when filling mergeable buffers > > This ^ patch (upstream commit 295525e29a) seems to cause a > network-related regression when using SWIOTLB in the guest. I noticed > this initially testing SEV guests, which use SWIOTLB by default, but > it can also be seen with normal guests when forcing SWIOTLB via > swiotlb=force kernel cmdline option. I see it with both 6.6-rc1 and > 6.6-rc2 (haven't tried rc3 yet, but don't see any related changes > there), and reverting 714073495f seems to avoid the issue. > > Steps to reproduce: > > 1) Boot QEMU/KVM guest with 6.6-rc2 with swiotlb=force via something like the > following cmdline: > >qemu-system-x86_64 \ >-machine q35 -smp 4,maxcpus=255 -cpu EPYC-Milan-v2 \ >-enable-kvm -m 16G,slots=5,maxmem=256G -vga none \ >-device virtio-scsi-pci,id=scsi0,disable-legacy=on,iommu_platform=true \ >-drive > file=/home/mroth/storage/ubuntu-18.04-seves2.qcow2,if=none,id=drive0,snapshot=off > \ >-device scsi-hd,id=hd0,drive=drive0,bus=scsi0.0 \ >-device > virtio-net-pci,netdev=netdev0,id=net0,disable-legacy=on,iommu_platform=true,romfile= > \ >-netdev tap,script=/home/mroth/qemu-ifup,id=netdev0 \ >-L /home/mroth/storage/AMDSEV2/snp-release-2023-09-23/usr/local/share/qemu > \ >-drive > if=pflash,format=raw,unit=0,file=/home/mroth/storage/AMDSEV2/snp-release-2023-09-23/usr/local/share/qemu/OVMF_CODE.fd,readonly > \ >-drive > if=pflash,format=raw,unit=1,file=/home/mroth/storage/AMDSEV2/snp-release-2023-09-23/usr/local/share/qemu/OVMF_VARS.fd > \ >-debugcon file:debug.log -global isa-debugcon.iobase=0x402 -msg > timestamp=on \ >-kernel /boot/vmlinuz-6.6.0-rc2-vanilla0+ \ >-initrd /boot/initrd.img-6.6.0-rc2-vanilla0+ \ >-append "root=UUID=d72a6d1c-06cf-4b79-af43-f1bac4f620f9 ro > console=ttyS0,115200n8 earlyprintk=serial,ttyS0,115200 debug=1 sev=debug > page_poison=0 spec_rstack_overflow=off swiotlb=force" > > 2) scp a small file from the host to the guest IP via its virtio-net device. >Smaller file sizes succeed, but the larger the file the more likely >it will fail. e.g.: > >mroth@host:~$ dd if=/dev/zero of=test bs=1K count=19 >19+0 records in >19+0 records out >19456 bytes (19 kB, 19 KiB) copied, 0.000940134 s, 20.7 MB/s >mroth@host:~$ scp test vm0: >test > 100% 19KB 10.1MB/s 00:00 >mroth@host:~$ dd if=/dev/zero of=test bs=1K count=20 >20+0 records in >20+0 records out >20480 bytes (20 kB, 20 KiB) copied, 0.00093774 s, 21.8 MB/s >mroth@host:~$ scp test vm0: >test > 0%0 0.0KB/s --:-- ETA >client_loop: send disconnect: Broken pipe >lost connection >mroth@host:~$ Hi Michael, Thanks for the report. Cloud you try this fix? I reproduce this issue, and tha
Re: [PATCH vhost v13 12/12] virtio_net: merge dma operations when filling mergeable buffers
On Tue, 26 Sep 2023 12:01:34 -0400, "Michael S. Tsirkin" wrote: > On Thu, Aug 10, 2023 at 08:30:57PM +0800, Xuan Zhuo wrote: > > Currently, the virtio core will perform a dma operation for each > > buffer. Although, the same page may be operated multiple times. > > > > This patch, the driver does the dma operation and manages the dma > > address based the feature premapped of virtio core. > > > > This way, we can perform only one dma operation for the pages of the > > alloc frag. This is beneficial for the iommu device. > > > > kernel command line: intel_iommu=on iommu.passthrough=0 > > > >| strict=0 | strict=1 > > Before | 775496pps | 428614pps > > After | 1109316pps | 742853pps > > > > Signed-off-by: Xuan Zhuo > > Hi Xuan Zhuo, > looks like this patch is causing regressions. Do you have time to debug > or should I revert? I have sended a fix to Michael Roth. I will wait his response. If that is ok, I will post that fix commit. Thanks. > > > --- > > drivers/net/virtio_net.c | 228 ++- > > 1 file changed, 202 insertions(+), 26 deletions(-) > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > index 486b5849033d..16adb5ef18f8 100644 > > --- a/drivers/net/virtio_net.c > > +++ b/drivers/net/virtio_net.c > > @@ -126,6 +126,14 @@ static const struct virtnet_stat_desc > > virtnet_rq_stats_desc[] = { > > #define VIRTNET_SQ_STATS_LEN ARRAY_SIZE(virtnet_sq_stats_desc) > > #define VIRTNET_RQ_STATS_LEN ARRAY_SIZE(virtnet_rq_stats_desc) > > > > +/* The dma information of pages allocated at a time. */ > > +struct virtnet_rq_dma { > > + dma_addr_t addr; > > + u32 ref; > > + u16 len; > > + u16 need_sync; > > +}; > > + > > /* Internal representation of a send virtqueue */ > > struct send_queue { > > /* Virtqueue associated with this send _queue */ > > @@ -175,6 +183,12 @@ struct receive_queue { > > char name[16]; > > > > struct xdp_rxq_info xdp_rxq; > > + > > + /* Record the last dma info to free after new pages is allocated. */ > > + struct virtnet_rq_dma *last_dma; > > + > > + /* Do dma by self */ > > + bool do_dma; > > }; > > > > /* This structure can contain rss message with maximum settings for > > indirection table and keysize > > @@ -549,6 +563,156 @@ static struct sk_buff *page_to_skb(struct > > virtnet_info *vi, > > return skb; > > } > > > > +static void virtnet_rq_unmap(struct receive_queue *rq, void *buf, u32 len) > > +{ > > + struct page *page = virt_to_head_page(buf); > > + struct virtnet_rq_dma *dma; > > + void *head; > > + int offset; > > + > > + head = page_address(page); > > + > > + dma = head; > > + > > + --dma->ref; > > + > > + if (dma->ref) { > > + if (dma->need_sync && len) { > > + offset = buf - (head + sizeof(*dma)); > > + > > + virtqueue_dma_sync_single_range_for_cpu(rq->vq, > > dma->addr, offset, > > + len, > > DMA_FROM_DEVICE); > > + } > > + > > + return; > > + } > > + > > + virtqueue_dma_unmap_single_attrs(rq->vq, dma->addr, dma->len, > > +DMA_FROM_DEVICE, > > DMA_ATTR_SKIP_CPU_SYNC); > > + put_page(page); > > +} > > + > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void > > **ctx) > > +{ > > + void *buf; > > + > > + buf = virtqueue_get_buf_ctx(rq->vq, len, ctx); > > + if (buf && rq->do_dma) > > + virtnet_rq_unmap(rq, buf, *len); > > + > > + return buf; > > +} > > + > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq) > > +{ > > + void *buf; > > + > > + buf = virtqueue_detach_unused_buf(rq->vq); > > + if (buf && rq->do_dma) > > + virtnet_rq_unmap(rq, buf, 0); > > + > > + return buf; > > +} > > + > > +static void virtnet_rq_init_one_sg(struct receive_queue *rq, void *buf, > > u32 len) > > +{ > > + struct virtnet_rq_dma *dma; > > + dma_addr_t addr; > > + u32 offset; > > + void *head; > > + > > + if (!rq->do_dma) { > > + sg_init_one(rq->sg, buf, len); > > + return; > > + } > > + > > + head = page_address(rq->alloc_frag.page); > > + > > + offset = buf - head; > > + > > + dma = head; > > + > > + addr = dma->addr - sizeof(*dma) + offset; > > + > > + sg_init_table(rq->sg, 1); > > + rq->sg[0].dma_address = addr; > > + rq->sg[0].length = len; > > +} > > + > > +static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t > > gfp) > > +{ > > + struct page_frag *alloc_frag = &rq->alloc_frag; > > + struct virtnet_rq_dma *dma; > > + void *buf, *head; > > + dma_addr_t addr; > > + > > + if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp))) > > + return NULL; > > + > > + head = page_address(alloc_frag->page); > > + > > + if (rq->do_dma) { > > + dma = head; > > + > > + /* new pages */ > > + if (!alloc_
[PATCH vhost] virtio_net: fix the missing of the dma cpu sync
Commit 295525e29a5b ("virtio_net: merge dma operations when filling mergeable buffers") unmaps the buffer with DMA_ATTR_SKIP_CPU_SYNC when the dma->ref is zero. We do that with DMA_ATTR_SKIP_CPU_SYNC, because we do not want to do the sync for the entire page_frag. But that misses the sync for the current area. This patch does cpu sync regardless of whether the ref is zero or not. Fixes: 295525e29a5b ("virtio_net: merge dma operations when filling mergeable buffers") Reported-by: Michael Roth Closes: http://lore.kernel.org/all/20230926130451.axgodaa6tvwqs...@amd.com Signed-off-by: Xuan Zhuo --- drivers/net/virtio_net.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 98dc9b49d56b..9ece27dc5144 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -589,16 +589,16 @@ static void virtnet_rq_unmap(struct receive_queue *rq, void *buf, u32 len) --dma->ref; - if (dma->ref) { - if (dma->need_sync && len) { - offset = buf - (head + sizeof(*dma)); + if (dma->need_sync && len) { + offset = buf - (head + sizeof(*dma)); - virtqueue_dma_sync_single_range_for_cpu(rq->vq, dma->addr, offset, - len, DMA_FROM_DEVICE); - } + virtqueue_dma_sync_single_range_for_cpu(rq->vq, dma->addr, + offset, len, + DMA_FROM_DEVICE); + } + if (dma->ref) return; - } virtqueue_dma_unmap_single_attrs(rq->vq, dma->addr, dma->len, DMA_FROM_DEVICE, DMA_ATTR_SKIP_CPU_SYNC); -- 2.32.0.3.g01195cf9f ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization