[PATCH] vduse: fix NULL pointer dereference
vduse_vdpa_set_vq_affinity callback can be called with NULL value as cpu_mask when deleting the vduse device. This patch clears virtqueue's IRQ affinity mask value instead of dereferencing NULL cpu_mask. [ 4760.952149] BUG: kernel NULL pointer dereference, address: [ 4760.959110] #PF: supervisor read access in kernel mode [ 4760.964247] #PF: error_code(0x) - not-present page [ 4760.969385] PGD 0 P4D 0 [ 4760.971927] Oops: [#1] PREEMPT SMP PTI [ 4760.976112] CPU: 13 PID: 2346 Comm: vdpa Not tainted 6.4.0-rc6+ #4 [ 4760.982291] Hardware name: Dell Inc. PowerEdge R640/0W23H8, BIOS 2.8.1 06/26/2020 [ 4760.989769] RIP: 0010:memcpy_orig+0xc5/0x130 [ 4760.994049] Code: 16 f8 4c 89 07 4c 89 4f 08 4c 89 54 17 f0 4c 89 5c 17 f8 c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 83 fa 08 72 1b <4c> 8b 06 4c 8b 4c 16 f8 4c 89 07 4c 89 4c 17 f8 c3 cc cc cc cc 66 [ 4761.012793] RSP: 0018:b1d565abb830 EFLAGS: 00010246 [ 4761.018020] RAX: 9f4bf6b27898 RBX: 9f4be23969c0 RCX: 9f4bcadf6400 [ 4761.025152] RDX: 0008 RSI: RDI: 9f4bf6b27898 [ 4761.032286] RBP: R08: 0008 R09: [ 4761.039416] R10: R11: 0600 R12: [ 4761.046549] R13: R14: 0080 R15: b1d565abbb10 [ 4761.053680] FS: 7f64c2ec2740() GS:9f635f98() knlGS: [ 4761.061765] CS: 0010 DS: ES: CR0: 80050033 [ 4761.067513] CR2: CR3: 001875270006 CR4: 007706e0 [ 4761.074645] DR0: DR1: DR2: [ 4761.081775] DR3: DR6: fffe0ff0 DR7: 0400 [ 4761.088909] PKRU: 5554 [ 4761.091620] Call Trace: [ 4761.094074] [ 4761.096180] ? __die+0x1f/0x70 [ 4761.099238] ? page_fault_oops+0x171/0x4f0 [ 4761.103340] ? exc_page_fault+0x7b/0x180 [ 4761.107265] ? asm_exc_page_fault+0x22/0x30 [ 4761.111460] ? memcpy_orig+0xc5/0x130 [ 4761.115126] vduse_vdpa_set_vq_affinity+0x3e/0x50 [vduse] [ 4761.120533] virtnet_clean_affinity.part.0+0x3d/0x90 [virtio_net] [ 4761.126635] remove_vq_common+0x1a4/0x250 [virtio_net] [ 4761.131781] virtnet_remove+0x5d/0x70 [virtio_net] [ 4761.136580] virtio_dev_remove+0x3a/0x90 [ 4761.140509] device_release_driver_internal+0x19b/0x200 [ 4761.145742] bus_remove_device+0xc2/0x130 [ 4761.149755] device_del+0x158/0x3e0 [ 4761.153245] ? kernfs_find_ns+0x35/0xc0 [ 4761.157086] device_unregister+0x13/0x60 [ 4761.161010] unregister_virtio_device+0x11/0x20 [ 4761.165543] device_release_driver_internal+0x19b/0x200 [ 4761.170770] bus_remove_device+0xc2/0x130 [ 4761.174782] device_del+0x158/0x3e0 [ 4761.178276] ? __pfx_vdpa_name_match+0x10/0x10 [vdpa] [ 4761.183336] device_unregister+0x13/0x60 [ 4761.187260] vdpa_nl_cmd_dev_del_set_doit+0x63/0xe0 [vdpa] Fixes: 28f6288eb63d ("vduse: Support set_vq_affinity callback") Cc: xieyon...@bytedance.com Signed-off-by: Maxime Coquelin --- drivers/vdpa/vdpa_user/vduse_dev.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index 5f5c21674fdc..cdca94e85762 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -726,7 +726,11 @@ static int vduse_vdpa_set_vq_affinity(struct vdpa_device *vdpa, u16 idx, { struct vduse_dev *dev = vdpa_to_vduse(vdpa); - cpumask_copy(&dev->vqs[idx]->irq_affinity, cpu_mask); + if (cpu_mask) + cpumask_copy(&dev->vqs[idx]->irq_affinity, cpu_mask); + else + cpumask_clear(&dev->vqs[idx]->irq_affinity); + return 0; } -- 2.40.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] vduse: fix NULL pointer dereference
On 6/15/23 09:25, Yongji Xie wrote: On Wed, Jun 14, 2023 at 7:52 PM Maxime Coquelin wrote: vduse_vdpa_set_vq_affinity callback can be called with NULL value as cpu_mask when deleting the vduse device. This patch clears virtqueue's IRQ affinity mask value instead of dereferencing NULL cpu_mask. [ 4760.952149] BUG: kernel NULL pointer dereference, address: [ 4760.959110] #PF: supervisor read access in kernel mode [ 4760.964247] #PF: error_code(0x) - not-present page [ 4760.969385] PGD 0 P4D 0 [ 4760.971927] Oops: [#1] PREEMPT SMP PTI [ 4760.976112] CPU: 13 PID: 2346 Comm: vdpa Not tainted 6.4.0-rc6+ #4 [ 4760.982291] Hardware name: Dell Inc. PowerEdge R640/0W23H8, BIOS 2.8.1 06/26/2020 [ 4760.989769] RIP: 0010:memcpy_orig+0xc5/0x130 [ 4760.994049] Code: 16 f8 4c 89 07 4c 89 4f 08 4c 89 54 17 f0 4c 89 5c 17 f8 c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 83 fa 08 72 1b <4c> 8b 06 4c 8b 4c 16 f8 4c 89 07 4c 89 4c 17 f8 c3 cc cc cc cc 66 [ 4761.012793] RSP: 0018:b1d565abb830 EFLAGS: 00010246 [ 4761.018020] RAX: 9f4bf6b27898 RBX: 9f4be23969c0 RCX: 9f4bcadf6400 [ 4761.025152] RDX: 0008 RSI: RDI: 9f4bf6b27898 [ 4761.032286] RBP: R08: 0008 R09: [ 4761.039416] R10: R11: 0600 R12: [ 4761.046549] R13: R14: 0080 R15: b1d565abbb10 [ 4761.053680] FS: 7f64c2ec2740() GS:9f635f98() knlGS: [ 4761.061765] CS: 0010 DS: ES: CR0: 80050033 [ 4761.067513] CR2: CR3: 001875270006 CR4: 007706e0 [ 4761.074645] DR0: DR1: DR2: [ 4761.081775] DR3: DR6: fffe0ff0 DR7: 0400 [ 4761.088909] PKRU: 5554 [ 4761.091620] Call Trace: [ 4761.094074] [ 4761.096180] ? __die+0x1f/0x70 [ 4761.099238] ? page_fault_oops+0x171/0x4f0 [ 4761.103340] ? exc_page_fault+0x7b/0x180 [ 4761.107265] ? asm_exc_page_fault+0x22/0x30 [ 4761.111460] ? memcpy_orig+0xc5/0x130 [ 4761.115126] vduse_vdpa_set_vq_affinity+0x3e/0x50 [vduse] [ 4761.120533] virtnet_clean_affinity.part.0+0x3d/0x90 [virtio_net] [ 4761.126635] remove_vq_common+0x1a4/0x250 [virtio_net] [ 4761.131781] virtnet_remove+0x5d/0x70 [virtio_net] [ 4761.136580] virtio_dev_remove+0x3a/0x90 [ 4761.140509] device_release_driver_internal+0x19b/0x200 [ 4761.145742] bus_remove_device+0xc2/0x130 [ 4761.149755] device_del+0x158/0x3e0 [ 4761.153245] ? kernfs_find_ns+0x35/0xc0 [ 4761.157086] device_unregister+0x13/0x60 [ 4761.161010] unregister_virtio_device+0x11/0x20 [ 4761.165543] device_release_driver_internal+0x19b/0x200 [ 4761.170770] bus_remove_device+0xc2/0x130 [ 4761.174782] device_del+0x158/0x3e0 [ 4761.178276] ? __pfx_vdpa_name_match+0x10/0x10 [vdpa] [ 4761.183336] device_unregister+0x13/0x60 [ 4761.187260] vdpa_nl_cmd_dev_del_set_doit+0x63/0xe0 [vdpa] Fixes: 28f6288eb63d ("vduse: Support set_vq_affinity callback") Cc: xieyon...@bytedance.com Signed-off-by: Maxime Coquelin --- drivers/vdpa/vdpa_user/vduse_dev.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index 5f5c21674fdc..cdca94e85762 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -726,7 +726,11 @@ static int vduse_vdpa_set_vq_affinity(struct vdpa_device *vdpa, u16 idx, { struct vduse_dev *dev = vdpa_to_vduse(vdpa); - cpumask_copy(&dev->vqs[idx]->irq_affinity, cpu_mask); + if (cpu_mask) + cpumask_copy(&dev->vqs[idx]->irq_affinity, cpu_mask); + else + cpumask_clear(&dev->vqs[idx]->irq_affinity); I think we should set all the bits of irq affinity instead: cpumask_setall(&dev->vqs[idx]->irq_affinity); I hesitated between both. My understanding is it only happens on removal, so either would work. But in case it can happen on other cases, it is indeed better to use cpumask_setall(). I will post a v2 today. Thanks, Maxime Thanks, Yongji ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v2] vduse: fix NULL pointer dereference
vduse_vdpa_set_vq_affinity callback can be called with NULL value as cpu_mask when deleting the vduse device. This patch resets virtqueue's IRQ affinity mask value to set all CPUs instead of dereferencing NULL cpu_mask. [ 4760.952149] BUG: kernel NULL pointer dereference, address: [ 4760.959110] #PF: supervisor read access in kernel mode [ 4760.964247] #PF: error_code(0x) - not-present page [ 4760.969385] PGD 0 P4D 0 [ 4760.971927] Oops: [#1] PREEMPT SMP PTI [ 4760.976112] CPU: 13 PID: 2346 Comm: vdpa Not tainted 6.4.0-rc6+ #4 [ 4760.982291] Hardware name: Dell Inc. PowerEdge R640/0W23H8, BIOS 2.8.1 06/26/2020 [ 4760.989769] RIP: 0010:memcpy_orig+0xc5/0x130 [ 4760.994049] Code: 16 f8 4c 89 07 4c 89 4f 08 4c 89 54 17 f0 4c 89 5c 17 f8 c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 83 fa 08 72 1b <4c> 8b 06 4c 8b 4c 16 f8 4c 89 07 4c 89 4c 17 f8 c3 cc cc cc cc 66 [ 4761.012793] RSP: 0018:b1d565abb830 EFLAGS: 00010246 [ 4761.018020] RAX: 9f4bf6b27898 RBX: 9f4be23969c0 RCX: 9f4bcadf6400 [ 4761.025152] RDX: 0008 RSI: RDI: 9f4bf6b27898 [ 4761.032286] RBP: R08: 0008 R09: [ 4761.039416] R10: R11: 0600 R12: [ 4761.046549] R13: R14: 0080 R15: b1d565abbb10 [ 4761.053680] FS: 7f64c2ec2740() GS:9f635f98() knlGS: [ 4761.061765] CS: 0010 DS: ES: CR0: 80050033 [ 4761.067513] CR2: CR3: 001875270006 CR4: 007706e0 [ 4761.074645] DR0: DR1: DR2: [ 4761.081775] DR3: DR6: fffe0ff0 DR7: 0400 [ 4761.088909] PKRU: 5554 [ 4761.091620] Call Trace: [ 4761.094074] [ 4761.096180] ? __die+0x1f/0x70 [ 4761.099238] ? page_fault_oops+0x171/0x4f0 [ 4761.103340] ? exc_page_fault+0x7b/0x180 [ 4761.107265] ? asm_exc_page_fault+0x22/0x30 [ 4761.111460] ? memcpy_orig+0xc5/0x130 [ 4761.115126] vduse_vdpa_set_vq_affinity+0x3e/0x50 [vduse] [ 4761.120533] virtnet_clean_affinity.part.0+0x3d/0x90 [virtio_net] [ 4761.126635] remove_vq_common+0x1a4/0x250 [virtio_net] [ 4761.131781] virtnet_remove+0x5d/0x70 [virtio_net] [ 4761.136580] virtio_dev_remove+0x3a/0x90 [ 4761.140509] device_release_driver_internal+0x19b/0x200 [ 4761.145742] bus_remove_device+0xc2/0x130 [ 4761.149755] device_del+0x158/0x3e0 [ 4761.153245] ? kernfs_find_ns+0x35/0xc0 [ 4761.157086] device_unregister+0x13/0x60 [ 4761.161010] unregister_virtio_device+0x11/0x20 [ 4761.165543] device_release_driver_internal+0x19b/0x200 [ 4761.170770] bus_remove_device+0xc2/0x130 [ 4761.174782] device_del+0x158/0x3e0 [ 4761.178276] ? __pfx_vdpa_name_match+0x10/0x10 [vdpa] [ 4761.183336] device_unregister+0x13/0x60 [ 4761.187260] vdpa_nl_cmd_dev_del_set_doit+0x63/0xe0 [vdpa] Fixes: 28f6288eb63d ("vduse: Support set_vq_affinity callback") Cc: xieyon...@bytedance.com Signed-off-by: Maxime Coquelin --- drivers/vdpa/vdpa_user/vduse_dev.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index 5f5c21674fdc..0d84e6a9c3cc 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -726,7 +726,11 @@ static int vduse_vdpa_set_vq_affinity(struct vdpa_device *vdpa, u16 idx, { struct vduse_dev *dev = vdpa_to_vduse(vdpa); - cpumask_copy(&dev->vqs[idx]->irq_affinity, cpu_mask); + if (cpu_mask) + cpumask_copy(&dev->vqs[idx]->irq_affinity, cpu_mask); + else + cpumask_setall(&dev->vqs[idx]->irq_affinity); + return 0; } -- 2.41.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v1 0/2] vduse: add support for networking devices
This small series enables virtio-net device type in VDUSE. With it, basic operation have been tested, both with virtio-vdpa and vhost-vdpa using DPDK Vhost library series adding VDUSE support using split rings layout (merged in DPDK v23.07-rc1). Control queue support (and so multiqueue) has also been tested, but requires a Kernel series from Jason Wang relaxing control queue polling [1] to function reliably. [1]: https://lore.kernel.org/lkml/cacgkmetgrxn3ppwsdo4oosnssljfembez0wvjgrr3whu+qa...@mail.gmail.com/T/ RFC -> v1 changes: == - Fail device init if it does not support VERSION_1 (Jason) Maxime Coquelin (2): vduse: validate block features only with block devices vduse: enable Virtio-net device type drivers/vdpa/vdpa_user/vduse_dev.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) -- 2.41.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v1 1/2] vduse: validate block features only with block devices
This patch is preliminary work to enable network device type support to VDUSE. As VIRTIO_BLK_F_CONFIG_WCE shares the same value as VIRTIO_NET_F_HOST_TSO4, we need to restrict its check to Virtio-blk device type. Signed-off-by: Maxime Coquelin --- drivers/vdpa/vdpa_user/vduse_dev.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index 5f5c21674fdc..c1c2f4c711ae 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -1658,13 +1658,14 @@ static bool device_is_allowed(u32 device_id) return false; } -static bool features_is_valid(u64 features) +static bool features_is_valid(struct vduse_dev_config *config) { - if (!(features & (1ULL << VIRTIO_F_ACCESS_PLATFORM))) + if (!(config->features & (1ULL << VIRTIO_F_ACCESS_PLATFORM))) return false; /* Now we only support read-only configuration space */ - if (features & (1ULL << VIRTIO_BLK_F_CONFIG_WCE)) + if ((config->device_id == VIRTIO_ID_BLOCK) && + (config->features & (1ULL << VIRTIO_BLK_F_CONFIG_WCE))) return false; return true; @@ -1691,7 +1692,7 @@ static bool vduse_validate_config(struct vduse_dev_config *config) if (!device_is_allowed(config->device_id)) return false; - if (!features_is_valid(config->features)) + if (!features_is_valid(config)) return false; return true; -- 2.41.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v1 2/2] vduse: enable Virtio-net device type
This patch adds Virtio-net device type to the supported devices types. Initialization fails if the device does not support VIRTIO_F_VERSION_1 feature, in order to guarantee the configuration space is read-only. Signed-off-by: Maxime Coquelin --- drivers/vdpa/vdpa_user/vduse_dev.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index c1c2f4c711ae..89088fa27026 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -142,6 +142,7 @@ static struct workqueue_struct *vduse_irq_bound_wq; static u32 allowed_device_id[] = { VIRTIO_ID_BLOCK, + VIRTIO_ID_NET, }; static inline struct vduse_dev *vdpa_to_vduse(struct vdpa_device *vdpa) @@ -1668,6 +1669,10 @@ static bool features_is_valid(struct vduse_dev_config *config) (config->features & (1ULL << VIRTIO_BLK_F_CONFIG_WCE))) return false; + if ((config->device_id == VIRTIO_ID_NET) && + !(config->features & (1ULL << VIRTIO_F_VERSION_1))) + return false; + return true; } @@ -2023,6 +2028,7 @@ static const struct vdpa_mgmtdev_ops vdpa_dev_mgmtdev_ops = { static struct virtio_device_id id_table[] = { { VIRTIO_ID_BLOCK, VIRTIO_DEV_ANY_ID }, + { VIRTIO_ID_NET, VIRTIO_DEV_ANY_ID }, { 0 }, }; -- 2.41.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v1 0/2] vduse: add support for networking devices
On 7/3/23 08:44, Jason Wang wrote: On Sun, Jul 2, 2023 at 9:37 PM Michael S. Tsirkin wrote: On Tue, Jun 27, 2023 at 01:36:50PM +0200, Maxime Coquelin wrote: This small series enables virtio-net device type in VDUSE. With it, basic operation have been tested, both with virtio-vdpa and vhost-vdpa using DPDK Vhost library series adding VDUSE support using split rings layout (merged in DPDK v23.07-rc1). Control queue support (and so multiqueue) has also been tested, but requires a Kernel series from Jason Wang relaxing control queue polling [1] to function reliably. [1]: https://lore.kernel.org/lkml/cacgkmetgrxn3ppwsdo4oosnssljfembez0wvjgrr3whu+qa...@mail.gmail.com/T/ Jason promised to post a new version of that patch. Right Jason? Yes. For now let's make sure CVQ feature flag is off? We can do that and relax on top of my patch. I agree? Do you prefer a features negotiation, or failing init (like done for VERSION_1) if the VDUSE application advertises CVQ? Thanks, Maxime Thanks RFC -> v1 changes: == - Fail device init if it does not support VERSION_1 (Jason) Maxime Coquelin (2): vduse: validate block features only with block devices vduse: enable Virtio-net device type drivers/vdpa/vdpa_user/vduse_dev.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) -- 2.41.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v1 0/2] vduse: add support for networking devices
On 7/3/23 23:45, Michael S. Tsirkin wrote: On Mon, Jul 03, 2023 at 09:43:49AM +0200, Maxime Coquelin wrote: On 7/3/23 08:44, Jason Wang wrote: On Sun, Jul 2, 2023 at 9:37 PM Michael S. Tsirkin wrote: On Tue, Jun 27, 2023 at 01:36:50PM +0200, Maxime Coquelin wrote: This small series enables virtio-net device type in VDUSE. With it, basic operation have been tested, both with virtio-vdpa and vhost-vdpa using DPDK Vhost library series adding VDUSE support using split rings layout (merged in DPDK v23.07-rc1). Control queue support (and so multiqueue) has also been tested, but requires a Kernel series from Jason Wang relaxing control queue polling [1] to function reliably. [1]: https://lore.kernel.org/lkml/cacgkmetgrxn3ppwsdo4oosnssljfembez0wvjgrr3whu+qa...@mail.gmail.com/T/ Jason promised to post a new version of that patch. Right Jason? Yes. For now let's make sure CVQ feature flag is off? We can do that and relax on top of my patch. I agree? Do you prefer a features negotiation, or failing init (like done for VERSION_1) if the VDUSE application advertises CVQ? Thanks, Maxime Unfortunately guests fail probe if feature set is inconsistent. So I don't think passing through features is a good idea, you need a list of legal bits. And when doing this, clear CVQ and everything that depends on it. Since this is temporary, while cvq is made more robust, I think it is better to fail VDUSE device creation if CVQ feature is advertised by the VDUSE application, instead of ensuring features depending on CVQ are also cleared. Jason seems to think likewise, would that work for you? Thanks, Maxime Thanks RFC -> v1 changes: == - Fail device init if it does not support VERSION_1 (Jason) Maxime Coquelin (2): vduse: validate block features only with block devices vduse: enable Virtio-net device type drivers/vdpa/vdpa_user/vduse_dev.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) -- 2.41.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v1 0/2] vduse: add support for networking devices
On 7/4/23 11:59, Michael S. Tsirkin wrote: On Tue, Jul 04, 2023 at 10:43:07AM +0200, Maxime Coquelin wrote: On 7/3/23 23:45, Michael S. Tsirkin wrote: On Mon, Jul 03, 2023 at 09:43:49AM +0200, Maxime Coquelin wrote: On 7/3/23 08:44, Jason Wang wrote: On Sun, Jul 2, 2023 at 9:37 PM Michael S. Tsirkin wrote: On Tue, Jun 27, 2023 at 01:36:50PM +0200, Maxime Coquelin wrote: This small series enables virtio-net device type in VDUSE. With it, basic operation have been tested, both with virtio-vdpa and vhost-vdpa using DPDK Vhost library series adding VDUSE support using split rings layout (merged in DPDK v23.07-rc1). Control queue support (and so multiqueue) has also been tested, but requires a Kernel series from Jason Wang relaxing control queue polling [1] to function reliably. [1]: https://lore.kernel.org/lkml/cacgkmetgrxn3ppwsdo4oosnssljfembez0wvjgrr3whu+qa...@mail.gmail.com/T/ Jason promised to post a new version of that patch. Right Jason? Yes. For now let's make sure CVQ feature flag is off? We can do that and relax on top of my patch. I agree? Do you prefer a features negotiation, or failing init (like done for VERSION_1) if the VDUSE application advertises CVQ? Thanks, Maxime Unfortunately guests fail probe if feature set is inconsistent. So I don't think passing through features is a good idea, you need a list of legal bits. And when doing this, clear CVQ and everything that depends on it. Since this is temporary, while cvq is made more robust, I think it is better to fail VDUSE device creation if CVQ feature is advertised by the VDUSE application, instead of ensuring features depending on CVQ are also cleared. Jason seems to think likewise, would that work for you? Thanks, Maxime Nothing is more permanent than temporary solutions. My concern would be that hardware devices then start masking CVQ intentionally just to avoid the pain of broken software. Got it, I'll add a patch on top that filters out CVQ feature and all the features that depend on it. Thanks, Maxime Thanks RFC -> v1 changes: == - Fail device init if it does not support VERSION_1 (Jason) Maxime Coquelin (2): vduse: validate block features only with block devices vduse: enable Virtio-net device type drivers/vdpa/vdpa_user/vduse_dev.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) -- 2.41.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v2 0/3] vduse: add support for networking devices
This small series enables virtio-net device type in VDUSE. With it, basic operation have been tested, both with virtio-vdpa and vhost-vdpa using DPDK Vhost library series adding VDUSE support using split rings layout (merged in DPDK v23.07-rc1). Control queue support (and so multiqueue) has also been tested, but requires a Kernel series from Jason Wang relaxing control queue polling [1] to function reliably, so while Jason rework is done, a patch is added to disable CVQ and features that depend on it (tested also with DPDK v23.07-rc1). [1]: https://lore.kernel.org/lkml/cacgkmetgrxn3ppwsdo4oosnssljfembez0wvjgrr3whu+qa...@mail.gmail.com/T/ v1 -> v2 changes: = - Add a patch to disable CVQ (Michael) RFC -> v1 changes: == - Fail device init if it does not support VERSION_1 (Jason) Maxime Coquelin (3): vduse: validate block features only with block devices vduse: enable Virtio-net device type vduse: Temporarily disable control queue features drivers/vdpa/vdpa_user/vduse_dev.c | 36 ++ 1 file changed, 32 insertions(+), 4 deletions(-) -- 2.41.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v2 2/3] vduse: enable Virtio-net device type
This patch adds Virtio-net device type to the supported devices types. Initialization fails if the device does not support VIRTIO_F_VERSION_1 feature, in order to guarantee the configuration space is read-only. Acked-by: Jason Wang Reviewed-by: Xie Yongji Signed-off-by: Maxime Coquelin --- drivers/vdpa/vdpa_user/vduse_dev.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index ff9fdd6783fe..1271c9796517 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -142,6 +142,7 @@ static struct workqueue_struct *vduse_irq_bound_wq; static u32 allowed_device_id[] = { VIRTIO_ID_BLOCK, + VIRTIO_ID_NET, }; static inline struct vduse_dev *vdpa_to_vduse(struct vdpa_device *vdpa) @@ -1672,6 +1673,10 @@ static bool features_is_valid(struct vduse_dev_config *config) (config->features & (1ULL << VIRTIO_BLK_F_CONFIG_WCE))) return false; + if ((config->device_id == VIRTIO_ID_NET) && + !(config->features & (1ULL << VIRTIO_F_VERSION_1))) + return false; + return true; } @@ -2027,6 +2032,7 @@ static const struct vdpa_mgmtdev_ops vdpa_dev_mgmtdev_ops = { static struct virtio_device_id id_table[] = { { VIRTIO_ID_BLOCK, VIRTIO_DEV_ANY_ID }, + { VIRTIO_ID_NET, VIRTIO_DEV_ANY_ID }, { 0 }, }; -- 2.41.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v2 1/3] vduse: validate block features only with block devices
This patch is preliminary work to enable network device type support to VDUSE. As VIRTIO_BLK_F_CONFIG_WCE shares the same value as VIRTIO_NET_F_HOST_TSO4, we need to restrict its check to Virtio-blk device type. Acked-by: Jason Wang Reviewed-by: Xie Yongji Signed-off-by: Maxime Coquelin --- drivers/vdpa/vdpa_user/vduse_dev.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index dc38ed21319d..ff9fdd6783fe 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -1662,13 +1662,14 @@ static bool device_is_allowed(u32 device_id) return false; } -static bool features_is_valid(u64 features) +static bool features_is_valid(struct vduse_dev_config *config) { - if (!(features & (1ULL << VIRTIO_F_ACCESS_PLATFORM))) + if (!(config->features & (1ULL << VIRTIO_F_ACCESS_PLATFORM))) return false; /* Now we only support read-only configuration space */ - if (features & (1ULL << VIRTIO_BLK_F_CONFIG_WCE)) + if ((config->device_id == VIRTIO_ID_BLOCK) && + (config->features & (1ULL << VIRTIO_BLK_F_CONFIG_WCE))) return false; return true; @@ -1695,7 +1696,7 @@ static bool vduse_validate_config(struct vduse_dev_config *config) if (!device_is_allowed(config->device_id)) return false; - if (!features_is_valid(config->features)) + if (!features_is_valid(config)) return false; return true; -- 2.41.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v2 3/3] vduse: Temporarily disable control queue features
Virtio-net driver control queue implementation is not safe when used with VDUSE. If the VDUSE application does not reply to control queue messages, it currently ends up hanging the kernel thread sending this command. Some work is on-going to make the control queue implementation robust with VDUSE. Until it is completed, let's disable control virtqueue and features that depend on it. Signed-off-by: Maxime Coquelin --- drivers/vdpa/vdpa_user/vduse_dev.c | 21 + 1 file changed, 21 insertions(+) diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index 1271c9796517..04367a53802b 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -1778,6 +1778,25 @@ static struct attribute *vduse_dev_attrs[] = { ATTRIBUTE_GROUPS(vduse_dev); +static void vduse_dev_features_fixup(struct vduse_dev_config *config) +{ + if (config->device_id == VIRTIO_ID_NET) { + /* +* Temporarily disable control virtqueue and features that +* depend on it while CVQ is being made more robust for VDUSE. +*/ + config->features &= ~((1ULL << VIRTIO_NET_F_CTRL_VQ) | + (1ULL << VIRTIO_NET_F_CTRL_RX) | + (1ULL << VIRTIO_NET_F_CTRL_VLAN) | + (1ULL << VIRTIO_NET_F_GUEST_ANNOUNCE) | + (1ULL << VIRTIO_NET_F_MQ) | + (1ULL << VIRTIO_NET_F_CTRL_MAC_ADDR) | + (1ULL << VIRTIO_NET_F_RSS) | + (1ULL << VIRTIO_NET_F_HASH_REPORT) | + (1ULL << VIRTIO_NET_F_NOTF_COAL)); + } +} + static int vduse_create_dev(struct vduse_dev_config *config, void *config_buf, u64 api_version) { @@ -1793,6 +1812,8 @@ static int vduse_create_dev(struct vduse_dev_config *config, if (!dev) goto err; + vduse_dev_features_fixup(config); + dev->api_version = api_version; dev->device_features = config->features; dev->device_id = config->device_id; -- 2.41.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2 3/3] vduse: Temporarily disable control queue features
On 7/4/23 18:43, Michael S. Tsirkin wrote: On Tue, Jul 04, 2023 at 06:40:45PM +0200, Maxime Coquelin wrote: Virtio-net driver control queue implementation is not safe when used with VDUSE. If the VDUSE application does not reply to control queue messages, it currently ends up hanging the kernel thread sending this command. Some work is on-going to make the control queue implementation robust with VDUSE. Until it is completed, let's disable control virtqueue and features that depend on it. Signed-off-by: Maxime Coquelin --- drivers/vdpa/vdpa_user/vduse_dev.c | 21 + 1 file changed, 21 insertions(+) diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index 1271c9796517..04367a53802b 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -1778,6 +1778,25 @@ static struct attribute *vduse_dev_attrs[] = { ATTRIBUTE_GROUPS(vduse_dev); +static void vduse_dev_features_fixup(struct vduse_dev_config *config) +{ + if (config->device_id == VIRTIO_ID_NET) { + /* +* Temporarily disable control virtqueue and features that +* depend on it while CVQ is being made more robust for VDUSE. +*/ + config->features &= ~((1ULL << VIRTIO_NET_F_CTRL_VQ) | + (1ULL << VIRTIO_NET_F_CTRL_RX) | + (1ULL << VIRTIO_NET_F_CTRL_VLAN) | + (1ULL << VIRTIO_NET_F_GUEST_ANNOUNCE) | + (1ULL << VIRTIO_NET_F_MQ) | + (1ULL << VIRTIO_NET_F_CTRL_MAC_ADDR) | + (1ULL << VIRTIO_NET_F_RSS) | + (1ULL << VIRTIO_NET_F_HASH_REPORT) | + (1ULL << VIRTIO_NET_F_NOTF_COAL)); + } +} + This will never be exhaustive, we are adding new features. Please add an allowlist with just legal ones instead. Ok, got it! I'll post a new revision. Thanks, Maxime static int vduse_create_dev(struct vduse_dev_config *config, void *config_buf, u64 api_version) { @@ -1793,6 +1812,8 @@ static int vduse_create_dev(struct vduse_dev_config *config, if (!dev) goto err; + vduse_dev_features_fixup(config); + dev->api_version = api_version; dev->device_features = config->features; dev->device_id = config->device_id; -- 2.41.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v3 2/3] vduse: enable Virtio-net device type
This patch adds Virtio-net device type to the supported devices types. Initialization fails if the device does not support VIRTIO_F_VERSION_1 feature, in order to guarantee the configuration space is read-only. Acked-by: Jason Wang Reviewed-by: Xie Yongji Signed-off-by: Maxime Coquelin --- drivers/vdpa/vdpa_user/vduse_dev.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index ff9fdd6783fe..1271c9796517 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -142,6 +142,7 @@ static struct workqueue_struct *vduse_irq_bound_wq; static u32 allowed_device_id[] = { VIRTIO_ID_BLOCK, + VIRTIO_ID_NET, }; static inline struct vduse_dev *vdpa_to_vduse(struct vdpa_device *vdpa) @@ -1672,6 +1673,10 @@ static bool features_is_valid(struct vduse_dev_config *config) (config->features & (1ULL << VIRTIO_BLK_F_CONFIG_WCE))) return false; + if ((config->device_id == VIRTIO_ID_NET) && + !(config->features & (1ULL << VIRTIO_F_VERSION_1))) + return false; + return true; } @@ -2027,6 +2032,7 @@ static const struct vdpa_mgmtdev_ops vdpa_dev_mgmtdev_ops = { static struct virtio_device_id id_table[] = { { VIRTIO_ID_BLOCK, VIRTIO_DEV_ANY_ID }, + { VIRTIO_ID_NET, VIRTIO_DEV_ANY_ID }, { 0 }, }; -- 2.41.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v3 0/3] vduse: add support for networking devices
This small series enables virtio-net device type in VDUSE. With it, basic operation have been tested, both with virtio-vdpa and vhost-vdpa using DPDK Vhost library series adding VDUSE support using split rings layout (merged in DPDK v23.07-rc1). Control queue support (and so multiqueue) has also been tested, but requires a Kernel series from Jason Wang relaxing control queue polling [1] to function reliably, so while Jason rework is done, a patch is added to disable CVQ and features that depend on it (tested also with DPDK v23.07-rc1). [1]: https://lore.kernel.org/lkml/cacgkmetgrxn3ppwsdo4oosnssljfembez0wvjgrr3whu+qa...@mail.gmail.com/T/ v2 -> v3 changes: = - Use allow list instead of deny list (Michael) v1 -> v2 changes: = - Add a patch to disable CVQ (Michael) RFC -> v1 changes: == - Fail device init if it does not support VERSION_1 (Jason) Maxime Coquelin (3): vduse: validate block features only with block devices vduse: enable Virtio-net device type vduse: Temporarily disable control queue features drivers/vdpa/vdpa_user/vduse_dev.c | 51 +++--- 1 file changed, 47 insertions(+), 4 deletions(-) -- 2.41.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v3 1/3] vduse: validate block features only with block devices
This patch is preliminary work to enable network device type support to VDUSE. As VIRTIO_BLK_F_CONFIG_WCE shares the same value as VIRTIO_NET_F_HOST_TSO4, we need to restrict its check to Virtio-blk device type. Acked-by: Jason Wang Reviewed-by: Xie Yongji Signed-off-by: Maxime Coquelin --- drivers/vdpa/vdpa_user/vduse_dev.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index dc38ed21319d..ff9fdd6783fe 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -1662,13 +1662,14 @@ static bool device_is_allowed(u32 device_id) return false; } -static bool features_is_valid(u64 features) +static bool features_is_valid(struct vduse_dev_config *config) { - if (!(features & (1ULL << VIRTIO_F_ACCESS_PLATFORM))) + if (!(config->features & (1ULL << VIRTIO_F_ACCESS_PLATFORM))) return false; /* Now we only support read-only configuration space */ - if (features & (1ULL << VIRTIO_BLK_F_CONFIG_WCE)) + if ((config->device_id == VIRTIO_ID_BLOCK) && + (config->features & (1ULL << VIRTIO_BLK_F_CONFIG_WCE))) return false; return true; @@ -1695,7 +1696,7 @@ static bool vduse_validate_config(struct vduse_dev_config *config) if (!device_is_allowed(config->device_id)) return false; - if (!features_is_valid(config->features)) + if (!features_is_valid(config)) return false; return true; -- 2.41.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v3 3/3] vduse: Temporarily disable control queue features
Virtio-net driver control queue implementation is not safe when used with VDUSE. If the VDUSE application does not reply to control queue messages, it currently ends up hanging the kernel thread sending this command. Some work is on-going to make the control queue implementation robust with VDUSE. Until it is completed, let's filter out control virtqueue and features that depend on it by keeping only features known to be supported. Signed-off-by: Maxime Coquelin --- drivers/vdpa/vdpa_user/vduse_dev.c | 36 ++ 1 file changed, 36 insertions(+) diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index 1271c9796517..7345071db0a8 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -46,6 +46,30 @@ #define IRQ_UNBOUND -1 +#define VDUSE_NET_VALID_FEATURES_MASK \ + (BIT_ULL(VIRTIO_NET_F_CSUM) | \ +BIT_ULL(VIRTIO_NET_F_GUEST_CSUM) | \ +BIT_ULL(VIRTIO_NET_F_MTU) |\ +BIT_ULL(VIRTIO_NET_F_MAC) |\ +BIT_ULL(VIRTIO_NET_F_GUEST_TSO4) | \ +BIT_ULL(VIRTIO_NET_F_GUEST_TSO6) | \ +BIT_ULL(VIRTIO_NET_F_GUEST_ECN) | \ +BIT_ULL(VIRTIO_NET_F_GUEST_UFO) | \ +BIT_ULL(VIRTIO_NET_F_HOST_TSO4) | \ +BIT_ULL(VIRTIO_NET_F_HOST_TSO6) | \ +BIT_ULL(VIRTIO_NET_F_HOST_ECN) | \ +BIT_ULL(VIRTIO_NET_F_HOST_UFO) | \ +BIT_ULL(VIRTIO_NET_F_MRG_RXBUF) | \ +BIT_ULL(VIRTIO_NET_F_STATUS) | \ +BIT_ULL(VIRTIO_NET_F_HOST_USO) | \ +BIT_ULL(VIRTIO_F_ANY_LAYOUT) | \ +BIT_ULL(VIRTIO_RING_F_INDIRECT_DESC) | \ +BIT_ULL(VIRTIO_F_EVENT_IDX) | \ +BIT_ULL(VIRTIO_F_VERSION_1) | \ +BIT_ULL(VIRTIO_F_IOMMU_PLATFORM) | \ +BIT_ULL(VIRTIO_F_RING_PACKED) |\ +BIT_ULL(VIRTIO_F_IN_ORDER)) + struct vduse_virtqueue { u16 index; u16 num_max; @@ -1778,6 +1802,16 @@ static struct attribute *vduse_dev_attrs[] = { ATTRIBUTE_GROUPS(vduse_dev); +static void vduse_dev_features_filter(struct vduse_dev_config *config) +{ + /* +* Temporarily filter out virtio-net's control virtqueue and features +* that depend on it while CVQ is being made more robust for VDUSE. +*/ + if (config->device_id == VIRTIO_ID_NET) + config->features &= VDUSE_NET_VALID_FEATURES_MASK; +} + static int vduse_create_dev(struct vduse_dev_config *config, void *config_buf, u64 api_version) { @@ -1793,6 +1827,8 @@ static int vduse_create_dev(struct vduse_dev_config *config, if (!dev) goto err; + vduse_dev_features_filter(config); + dev->api_version = api_version; dev->device_features = config->features; dev->device_id = config->device_id; -- 2.41.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH] vduse: Use proper spinlock for IRQ injection
The IRQ injection work used spin_lock_irq() to protect the scheduling of the softirq, but spin_lock_bh() should be used. With spin_lock_irq(), we noticed delay of more than 6 seconds between the time a NAPI polling work is scheduled and the time it is executed. Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace") Cc: xieyon...@bytedance.com Suggested-by: Jason Wang Signed-off-by: Maxime Coquelin --- drivers/vdpa/vdpa_user/vduse_dev.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index dc38ed21319d..df7869537ef1 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -935,10 +935,10 @@ static void vduse_dev_irq_inject(struct work_struct *work) { struct vduse_dev *dev = container_of(work, struct vduse_dev, inject); - spin_lock_irq(&dev->irq_lock); + spin_lock_bh(&dev->irq_lock); if (dev->config_cb.callback) dev->config_cb.callback(dev->config_cb.private); - spin_unlock_irq(&dev->irq_lock); + spin_unlock_bh(&dev->irq_lock); } static void vduse_vq_irq_inject(struct work_struct *work) @@ -946,10 +946,10 @@ static void vduse_vq_irq_inject(struct work_struct *work) struct vduse_virtqueue *vq = container_of(work, struct vduse_virtqueue, inject); - spin_lock_irq(&vq->irq_lock); + spin_lock_bh(&vq->irq_lock); if (vq->ready && vq->cb.callback) vq->cb.callback(vq->cb.private); - spin_unlock_irq(&vq->irq_lock); + spin_unlock_bh(&vq->irq_lock); } static bool vduse_vq_signal_irqfd(struct vduse_virtqueue *vq) -- 2.41.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC 1/4] vduse: Add the struct to save the vq reconnect info
Hello Cindy, On 6/28/23 08:59, Cindy Lu wrote: From: Your Name this struct is to save the reconnect info struct, in this struct saved the page info that alloc to save the reconnect info Signed-off-by: Cindy Lu --- drivers/vdpa/vdpa_user/vduse_dev.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index 26b7e29cb900..f845dc46b1db 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -72,6 +72,12 @@ struct vduse_umem { struct page **pages; struct mm_struct *mm; }; +struct vdpa_reconnect_info { + u32 index; + phys_addr_t addr; + unsigned long vaddr; + phys_addr_t size; +}; struct vduse_dev { struct vduse_vdpa *vdev; @@ -106,6 +112,7 @@ struct vduse_dev { u32 vq_align; struct vduse_umem *umem; struct mutex mem_lock; + struct vdpa_reconnect_info reconnect_info[64]; Why 64? Shouldn't it be part of struct vduse_virtqueue instead? }; struct vduse_dev_msg { ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC 2/4] vduse: Add file operation for mmap
On 6/28/23 08:59, Cindy Lu wrote: From: Your Name Add the operation for mmap, The user space APP will use this function to map the pages to userspace Signed-off-by: Cindy Lu --- drivers/vdpa/vdpa_user/vduse_dev.c | 49 ++ 1 file changed, 49 insertions(+) diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index f845dc46b1db..1b833bf0ae37 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -1313,6 +1313,54 @@ static struct vduse_dev *vduse_dev_get_from_minor(int minor) return dev; } + +static vm_fault_t vduse_vm_fault(struct vm_fault *vmf) +{ + struct vduse_dev *dev = vmf->vma->vm_file->private_data; + struct vm_area_struct *vma = vmf->vma; + u16 index = vma->vm_pgoff; + + struct vdpa_reconnect_info *info; + info = &dev->reconnect_info[index]; + + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); + if (remap_pfn_range(vma, vmf->address & PAGE_MASK, PFN_DOWN(info->addr), + PAGE_SIZE, vma->vm_page_prot)) + return VM_FAULT_SIGBUS; + return VM_FAULT_NOPAGE; +} + +static const struct vm_operations_struct vduse_vm_ops = { + .fault = vduse_vm_fault, +}; + +static int vduse_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct vduse_dev *dev = file->private_data; + struct vdpa_reconnect_info *info; + unsigned long index = vma->vm_pgoff; + + if (vma->vm_end - vma->vm_start != PAGE_SIZE) + return -EINVAL; + if ((vma->vm_flags & VM_SHARED) == 0) + return -EINVAL; + + if (index > 65535) + return -EINVAL; You declare an array of 64 entries in patch 1, so it can overflow. + + info = &dev->reconnect_info[index]; + if (info->addr & (PAGE_SIZE - 1)) + return -EINVAL; + if (vma->vm_end - vma->vm_start != info->size) { + return -ENOTSUPP; + } + + vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP); + vma->vm_ops = &vduse_vm_ops; + + return 0; +} + static int vduse_dev_open(struct inode *inode, struct file *file) { int ret; @@ -1345,6 +1393,7 @@ static const struct file_operations vduse_dev_fops = { .unlocked_ioctl = vduse_dev_ioctl, .compat_ioctl = compat_ptr_ioctl, .llseek = noop_llseek, + .mmap = vduse_mmap, }; static struct vduse_dev *vduse_dev_create(void) ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next v4 2/2] virtio-net: add cond_resched() to the command waiting loop
On 7/20/23 23:02, Michael S. Tsirkin wrote: On Thu, Jul 20, 2023 at 01:26:20PM -0700, Shannon Nelson wrote: On 7/20/23 1:38 AM, Jason Wang wrote: Adding cond_resched() to the command waiting loop for a better co-operation with the scheduler. This allows to give CPU a breath to run other task(workqueue) instead of busy looping when preemption is not allowed on a device whose CVQ might be slow. Signed-off-by: Jason Wang This still leaves hung processes, but at least it doesn't pin the CPU any more. Thanks. Reviewed-by: Shannon Nelson I'd like to see a full solution 1- block until interrupt Would it make sense to also have a timeout? And when timeout expires, set FAILED bit in device status? 2- still handle surprise removal correctly by waking in that case --- drivers/net/virtio_net.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 9f3b1d6ac33d..e7533f29b219 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -2314,8 +2314,10 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, * into the hypervisor, so the request should be handled immediately. */ while (!virtqueue_get_buf(vi->cvq, &tmp) && - !virtqueue_is_broken(vi->cvq)) + !virtqueue_is_broken(vi->cvq)) { + cond_resched(); cpu_relax(); + } return vi->ctrl->status == VIRTIO_NET_OK; } -- 2.39.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next v4 2/2] virtio-net: add cond_resched() to the command waiting loop
On 7/21/23 16:45, Michael S. Tsirkin wrote: On Fri, Jul 21, 2023 at 04:37:00PM +0200, Maxime Coquelin wrote: On 7/20/23 23:02, Michael S. Tsirkin wrote: On Thu, Jul 20, 2023 at 01:26:20PM -0700, Shannon Nelson wrote: On 7/20/23 1:38 AM, Jason Wang wrote: Adding cond_resched() to the command waiting loop for a better co-operation with the scheduler. This allows to give CPU a breath to run other task(workqueue) instead of busy looping when preemption is not allowed on a device whose CVQ might be slow. Signed-off-by: Jason Wang This still leaves hung processes, but at least it doesn't pin the CPU any more. Thanks. Reviewed-by: Shannon Nelson I'd like to see a full solution 1- block until interrupt Would it make sense to also have a timeout? And when timeout expires, set FAILED bit in device status? virtio spec does not set any limits on the timing of vq processing. Indeed, but I thought the driver could decide it is too long for it. The issue is we keep waiting with rtnl locked, it can quickly make the system unusable. 2- still handle surprise removal correctly by waking in that case --- drivers/net/virtio_net.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 9f3b1d6ac33d..e7533f29b219 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -2314,8 +2314,10 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, * into the hypervisor, so the request should be handled immediately. */ while (!virtqueue_get_buf(vi->cvq, &tmp) && - !virtqueue_is_broken(vi->cvq)) + !virtqueue_is_broken(vi->cvq)) { + cond_resched(); cpu_relax(); + } return vi->ctrl->status == VIRTIO_NET_OK; } -- 2.39.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next v4 2/2] virtio-net: add cond_resched() to the command waiting loop
On 7/21/23 17:10, Michael S. Tsirkin wrote: On Fri, Jul 21, 2023 at 04:58:04PM +0200, Maxime Coquelin wrote: On 7/21/23 16:45, Michael S. Tsirkin wrote: On Fri, Jul 21, 2023 at 04:37:00PM +0200, Maxime Coquelin wrote: On 7/20/23 23:02, Michael S. Tsirkin wrote: On Thu, Jul 20, 2023 at 01:26:20PM -0700, Shannon Nelson wrote: On 7/20/23 1:38 AM, Jason Wang wrote: Adding cond_resched() to the command waiting loop for a better co-operation with the scheduler. This allows to give CPU a breath to run other task(workqueue) instead of busy looping when preemption is not allowed on a device whose CVQ might be slow. Signed-off-by: Jason Wang This still leaves hung processes, but at least it doesn't pin the CPU any more. Thanks. Reviewed-by: Shannon Nelson I'd like to see a full solution 1- block until interrupt Would it make sense to also have a timeout? And when timeout expires, set FAILED bit in device status? virtio spec does not set any limits on the timing of vq processing. Indeed, but I thought the driver could decide it is too long for it. The issue is we keep waiting with rtnl locked, it can quickly make the system unusable. if this is a problem we should find a way not to keep rtnl locked indefinitely. From the tests I have done, I think it is. With OVS, a reconfiguration is performed when the VDUSE device is added, and when a MLX5 device is in the same bridge, it ends up doing an ioctl() that tries to take the rtnl lock. In this configuration, it is not possible to kill OVS because it is stuck trying to acquire rtnl lock for mlx5 that is held by virtio- net. 2- still handle surprise removal correctly by waking in that case --- drivers/net/virtio_net.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 9f3b1d6ac33d..e7533f29b219 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -2314,8 +2314,10 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, * into the hypervisor, so the request should be handled immediately. */ while (!virtqueue_get_buf(vi->cvq, &tmp) && - !virtqueue_is_broken(vi->cvq)) + !virtqueue_is_broken(vi->cvq)) { + cond_resched(); cpu_relax(); + } return vi->ctrl->status == VIRTIO_NET_OK; } -- 2.39.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 0/3] vduse: add support for networking devices
On 8/11/23 00:00, Jakub Kicinski wrote: On Thu, 10 Aug 2023 17:42:11 -0400 Michael S. Tsirkin wrote: Directly into the stack? I thought VDUSE is vDPA in user space, meaning to get to the kernel the packet has to first go thru a virtio-net instance. yes. is that a sufficient filter in your opinion? Yes, the ability to create the device feels stronger than CAP_NET_RAW, and a bit tangential to CAP_NET_ADMIN. But I don't have much practical experience with virt so no strong opinion, perhaps it does make sense for someone's deployment? Dunno.. I'm not sure CAP_NET_ADMIN should be required for creating the VDUSE devices, as the device could be attached to vhost-vDPA and so not visible to the Kernel networking stack. However, CAP_NET_ADMIN should be required to attach the VDUSE device to virtio-vdpa/virtio-net. Does that make sense? Maxime ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 0/3] vduse: add support for networking devices
On 8/29/23 19:05, Michael S. Tsirkin wrote: On Tue, Aug 29, 2023 at 03:34:06PM +0200, Maxime Coquelin wrote: On 8/11/23 00:00, Jakub Kicinski wrote: On Thu, 10 Aug 2023 17:42:11 -0400 Michael S. Tsirkin wrote: Directly into the stack? I thought VDUSE is vDPA in user space, meaning to get to the kernel the packet has to first go thru a virtio-net instance. yes. is that a sufficient filter in your opinion? Yes, the ability to create the device feels stronger than CAP_NET_RAW, and a bit tangential to CAP_NET_ADMIN. But I don't have much practical experience with virt so no strong opinion, perhaps it does make sense for someone's deployment? Dunno.. I'm not sure CAP_NET_ADMIN should be required for creating the VDUSE devices, as the device could be attached to vhost-vDPA and so not visible to the Kernel networking stack. However, CAP_NET_ADMIN should be required to attach the VDUSE device to virtio-vdpa/virtio-net. Does that make sense? Maxime OK. How are we going to enforce it? Actually, it seems already enforced for all VDPA devices types. Indeed, the VDPA_CMD_DEV_NEW Netlink command used to add the device to the VDPA bus has the GENL_ADMIN_PERM flag set, and so require CAT_NET_ADMIN. Also, we need a way for selinux to enable/disable some of these things but not others. Ok, I can do it in a patch on top. Do you have a pointer where it is done for Virtio Block devices? Maxime ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC v2 3/4] vduse: update the vq_info in ioctl
On 9/25/23 06:15, Cindy Lu wrote: On Tue, Sep 12, 2023 at 3:39 PM Jason Wang wrote: On Tue, Sep 12, 2023 at 11:00 AM Cindy Lu wrote: In VDUSE_VQ_GET_INFO, the driver will sync the last_avail_idx with reconnect info, After mapping the reconnect pages to userspace The userspace App will update the reconnect_time in struct vhost_reconnect_vring, If this is not 0 then it means this vq is reconnected and will update the last_avail_idx Signed-off-by: Cindy Lu --- drivers/vdpa/vdpa_user/vduse_dev.c | 13 + include/uapi/linux/vduse.h | 6 ++ 2 files changed, 19 insertions(+) diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index 2c69f4004a6e..680b23dbdde2 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -1221,6 +1221,8 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd, struct vduse_vq_info vq_info; struct vduse_virtqueue *vq; u32 index; + struct vdpa_reconnect_info *area; + struct vhost_reconnect_vring *vq_reconnect; ret = -EFAULT; if (copy_from_user(&vq_info, argp, sizeof(vq_info))) @@ -1252,6 +1254,17 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd, vq_info.ready = vq->ready; + area = &vq->reconnect_info; + + vq_reconnect = (struct vhost_reconnect_vring *)area->vaddr; + /*check if the vq is reconnect, if yes then update the last_avail_idx*/ + if ((vq_reconnect->last_avail_idx != +vq_info.split.avail_index) && + (vq_reconnect->reconnect_time != 0)) { + vq_info.split.avail_index = + vq_reconnect->last_avail_idx; + } + ret = -EFAULT; if (copy_to_user(argp, &vq_info, sizeof(vq_info))) break; diff --git a/include/uapi/linux/vduse.h b/include/uapi/linux/vduse.h index 11bd48c72c6c..d585425803fd 100644 --- a/include/uapi/linux/vduse.h +++ b/include/uapi/linux/vduse.h @@ -350,4 +350,10 @@ struct vduse_dev_response { }; }; +struct vhost_reconnect_vring { + __u16 reconnect_time; + __u16 last_avail_idx; + _Bool avail_wrap_counter; Please add a comment for each field. Sure will do And I never saw _Bool is used in uapi before, maybe it's better to pack it with last_avail_idx into a __u32. Thanks will fix this Btw, do we need to track inflight descriptors as well? I will check this For existing networking implemenation, this is not necessary. But it should be for block devices. Maxime Thanks cindy Thanks +}; + #endif /* _UAPI_VDUSE_H_ */ -- 2.34.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC v2 4/4] vduse: Add new ioctl VDUSE_GET_RECONNECT_INFO
On 9/25/23 04:57, Jason Wang wrote: On Thu, Sep 21, 2023 at 10:07 PM Cindy Lu wrote: On Mon, Sep 18, 2023 at 4:49 PM Jason Wang wrote: On Tue, Sep 12, 2023 at 11:01 AM Cindy Lu wrote: In VDUSE_GET_RECONNECT_INFO, the Userspace App can get the map size and The number of mapping memory pages from the kernel. The userspace App can use this information to map the pages. Signed-off-by: Cindy Lu --- drivers/vdpa/vdpa_user/vduse_dev.c | 15 +++ include/uapi/linux/vduse.h | 15 +++ 2 files changed, 30 insertions(+) diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index 680b23dbdde2..c99f99892b5c 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -1368,6 +1368,21 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd, ret = 0; break; } + case VDUSE_GET_RECONNECT_INFO: { + struct vduse_reconnect_mmap_info info; + + ret = -EFAULT; + if (copy_from_user(&info, argp, sizeof(info))) + break; + + info.size = PAGE_SIZE; + info.max_index = dev->vq_num + 1; + + if (copy_to_user(argp, &info, sizeof(info))) + break; + ret = 0; + break; + } default: ret = -ENOIOCTLCMD; break; diff --git a/include/uapi/linux/vduse.h b/include/uapi/linux/vduse.h index d585425803fd..ce55e34f63d7 100644 --- a/include/uapi/linux/vduse.h +++ b/include/uapi/linux/vduse.h @@ -356,4 +356,19 @@ struct vhost_reconnect_vring { _Bool avail_wrap_counter; }; +/** + * struct vduse_reconnect_mmap_info + * @size: mapping memory size, always page_size here + * @max_index: the number of pages allocated in kernel,just + * use for check + */ + +struct vduse_reconnect_mmap_info { + __u32 size; + __u32 max_index; +}; One thing I didn't understand is that, aren't the things we used to store connection info belong to uAPI? If not, how can we make sure the connections work across different vendors/implementations. If yes, where? Thanks The process for this reconnecttion is A.The first-time connection 1> The userland app checks if the device exists 2> use the ioctl to create the vduse device 3> Mapping the kernel page to userland and save the App-version/features/other information to this page 4> if the Userland app needs to exit, then the Userland app will only unmap the page and then exit B, the re-connection 1> the userland app finds the device is existing 2> Mapping the kernel page to userland 3> check if the information in shared memory is satisfied to reconnect,if ok then continue to reconnect 4> continue working For now these information are all from userland,So here the page will be maintained by the userland App in the previous code we only saved the api-version by uAPI . if we need to support reconnection maybe we need to add 2 new uAPI for this, one of the uAPI is to save the reconnect information and another is to get the information maybe something like struct vhost_reconnect_data { uint32_t version; uint64_t features; uint8_t status; struct virtio_net_config config; uint32_t nr_vrings; }; Probably, then we can make sure the re-connection works across different vduse-daemon implementations. +1, we need to have this defined in the uAPI to support interoperability across different VDUSE userspace implementations. #define VDUSE_GET_RECONNECT_INFO _IOR (VDUSE_BASE, 0x1c, struct vhost_reconnect_data) #define VDUSE_SET_RECONNECT_INFO _IOWR(VDUSE_BASE, 0x1d, struct vhost_reconnect_data) Not sure I get this, but the idea is to map those pages to user space, any reason we need this uAPI? It should not be necessary if the mmapped layout is properly defined. Thanks, Maxime Thanks Thanks Cindy + +#define VDUSE_GET_RECONNECT_INFO \ + _IOWR(VDUSE_BASE, 0x1b, struct vduse_reconnect_mmap_info) + #endif /* _UAPI_VDUSE_H_ */ -- 2.34.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC v2 3/4] vduse: update the vq_info in ioctl
On 9/12/23 09:39, Jason Wang wrote: On Tue, Sep 12, 2023 at 11:00 AM Cindy Lu wrote: In VDUSE_VQ_GET_INFO, the driver will sync the last_avail_idx with reconnect info, After mapping the reconnect pages to userspace The userspace App will update the reconnect_time in struct vhost_reconnect_vring, If this is not 0 then it means this vq is reconnected and will update the last_avail_idx Signed-off-by: Cindy Lu --- drivers/vdpa/vdpa_user/vduse_dev.c | 13 + include/uapi/linux/vduse.h | 6 ++ 2 files changed, 19 insertions(+) diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index 2c69f4004a6e..680b23dbdde2 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -1221,6 +1221,8 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd, struct vduse_vq_info vq_info; struct vduse_virtqueue *vq; u32 index; + struct vdpa_reconnect_info *area; + struct vhost_reconnect_vring *vq_reconnect; ret = -EFAULT; if (copy_from_user(&vq_info, argp, sizeof(vq_info))) @@ -1252,6 +1254,17 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd, vq_info.ready = vq->ready; + area = &vq->reconnect_info; + + vq_reconnect = (struct vhost_reconnect_vring *)area->vaddr; + /*check if the vq is reconnect, if yes then update the last_avail_idx*/ + if ((vq_reconnect->last_avail_idx != +vq_info.split.avail_index) && + (vq_reconnect->reconnect_time != 0)) { + vq_info.split.avail_index = + vq_reconnect->last_avail_idx; + } + ret = -EFAULT; if (copy_to_user(argp, &vq_info, sizeof(vq_info))) break; diff --git a/include/uapi/linux/vduse.h b/include/uapi/linux/vduse.h index 11bd48c72c6c..d585425803fd 100644 --- a/include/uapi/linux/vduse.h +++ b/include/uapi/linux/vduse.h @@ -350,4 +350,10 @@ struct vduse_dev_response { }; }; +struct vhost_reconnect_vring { + __u16 reconnect_time; + __u16 last_avail_idx; + _Bool avail_wrap_counter; Please add a comment for each field. And I never saw _Bool is used in uapi before, maybe it's better to pack it with last_avail_idx into a __u32. Better as two distincts __u16 IMHO. Thanks, Maxime Btw, do we need to track inflight descriptors as well? Thanks +}; + #endif /* _UAPI_VDUSE_H_ */ -- 2.34.3 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v4 3/4] vduse: Temporarily disable control queue features
Virtio-net driver control queue implementation is not safe when used with VDUSE. If the VDUSE application does not reply to control queue messages, it currently ends up hanging the kernel thread sending this command. Some work is on-going to make the control queue implementation robust with VDUSE. Until it is completed, let's disable control virtqueue and features that depend on it. Signed-off-by: Maxime Coquelin --- drivers/vdpa/vdpa_user/vduse_dev.c | 37 ++ 1 file changed, 37 insertions(+) diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index 73ad3b7efd8e..0243dee9cf0e 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -28,6 +28,7 @@ #include #include #include +#include #include #include "iova_domain.h" @@ -46,6 +47,30 @@ #define IRQ_UNBOUND -1 +#define VDUSE_NET_VALID_FEATURES_MASK \ + (BIT_ULL(VIRTIO_NET_F_CSUM) | \ +BIT_ULL(VIRTIO_NET_F_GUEST_CSUM) | \ +BIT_ULL(VIRTIO_NET_F_MTU) |\ +BIT_ULL(VIRTIO_NET_F_MAC) |\ +BIT_ULL(VIRTIO_NET_F_GUEST_TSO4) | \ +BIT_ULL(VIRTIO_NET_F_GUEST_TSO6) | \ +BIT_ULL(VIRTIO_NET_F_GUEST_ECN) | \ +BIT_ULL(VIRTIO_NET_F_GUEST_UFO) | \ +BIT_ULL(VIRTIO_NET_F_HOST_TSO4) | \ +BIT_ULL(VIRTIO_NET_F_HOST_TSO6) | \ +BIT_ULL(VIRTIO_NET_F_HOST_ECN) | \ +BIT_ULL(VIRTIO_NET_F_HOST_UFO) | \ +BIT_ULL(VIRTIO_NET_F_MRG_RXBUF) | \ +BIT_ULL(VIRTIO_NET_F_STATUS) | \ +BIT_ULL(VIRTIO_NET_F_HOST_USO) | \ +BIT_ULL(VIRTIO_F_ANY_LAYOUT) | \ +BIT_ULL(VIRTIO_RING_F_INDIRECT_DESC) | \ +BIT_ULL(VIRTIO_RING_F_EVENT_IDX) | \ +BIT_ULL(VIRTIO_F_VERSION_1) | \ +BIT_ULL(VIRTIO_F_ACCESS_PLATFORM) | \ +BIT_ULL(VIRTIO_F_RING_PACKED) |\ +BIT_ULL(VIRTIO_F_IN_ORDER)) + struct vduse_virtqueue { u16 index; u16 num_max; @@ -1778,6 +1803,16 @@ static struct attribute *vduse_dev_attrs[] = { ATTRIBUTE_GROUPS(vduse_dev); +static void vduse_dev_features_filter(struct vduse_dev_config *config) +{ + /* +* Temporarily filter out virtio-net's control virtqueue and features +* that depend on it while CVQ is being made more robust for VDUSE. +*/ + if (config->device_id == VIRTIO_ID_NET) + config->features &= VDUSE_NET_VALID_FEATURES_MASK; +} + static int vduse_create_dev(struct vduse_dev_config *config, void *config_buf, u64 api_version) { @@ -1793,6 +1828,8 @@ static int vduse_create_dev(struct vduse_dev_config *config, if (!dev) goto err; + vduse_dev_features_filter(config); + dev->api_version = api_version; dev->device_features = config->features; dev->device_id = config->device_id; -- 2.41.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v4 2/4] vduse: enable Virtio-net device type
This patch adds Virtio-net device type to the supported devices types. Initialization fails if the device does not support VIRTIO_F_VERSION_1 feature, in order to guarantee the configuration space is read-only. Acked-by: Jason Wang Reviewed-by: Xie Yongji Signed-off-by: Maxime Coquelin --- drivers/vdpa/vdpa_user/vduse_dev.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index 5b3879976b3d..73ad3b7efd8e 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -142,6 +142,7 @@ static struct workqueue_struct *vduse_irq_bound_wq; static u32 allowed_device_id[] = { VIRTIO_ID_BLOCK, + VIRTIO_ID_NET, }; static inline struct vduse_dev *vdpa_to_vduse(struct vdpa_device *vdpa) @@ -1672,6 +1673,10 @@ static bool features_is_valid(struct vduse_dev_config *config) (config->features & (1ULL << VIRTIO_BLK_F_CONFIG_WCE))) return false; + if ((config->device_id == VIRTIO_ID_NET) && + !(config->features & (1ULL << VIRTIO_F_VERSION_1))) + return false; + return true; } @@ -2027,6 +2032,7 @@ static const struct vdpa_mgmtdev_ops vdpa_dev_mgmtdev_ops = { static struct virtio_device_id id_table[] = { { VIRTIO_ID_BLOCK, VIRTIO_DEV_ANY_ID }, + { VIRTIO_ID_NET, VIRTIO_DEV_ANY_ID }, { 0 }, }; -- 2.41.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v4 0/4] vduse: add support for networking devices
This small series enables virtio-net device type in VDUSE. With it, basic operation have been tested, both with virtio-vdpa and vhost-vdpa using DPDK Vhost library series adding VDUSE support using split rings layout (merged in DPDK v23.07-rc1). Control queue support (and so multiqueue) has also been tested, but requires a Kernel series from Jason Wang relaxing control queue polling [1] to function reliably, so while Jason rework is done, a patch is added to disable CVQ and features that depend on it (tested also with DPDK v23.07-rc1). In this v4, LSM hooks are added to allow/deny application to create/destroy/open devices based on their type (Net, Block). [1]: https://lore.kernel.org/lkml/cacgkmetgrxn3ppwsdo4oosnssljfembez0wvjgrr3whu+qa...@mail.gmail.com/T/ v3->v4 changes: === - Add LSM hooks (Michael) - Rebase v2 -> v3 changes: = - Use allow list instead of deny list (Michael) v1 -> v2 changes: = - Add a patch to disable CVQ (Michael) RFC -> v1 changes: == - Fail device init if it does not support VERSION_1 (Jason) Maxime Coquelin (4): vduse: validate block features only with block devices vduse: enable Virtio-net device type vduse: Temporarily disable control queue features vduse: Add LSM hooks to check Virtio device type drivers/vdpa/vdpa_user/vduse_dev.c | 64 +++-- include/linux/lsm_hook_defs.h | 4 ++ include/linux/security.h| 15 +++ security/security.c | 42 +++ security/selinux/hooks.c| 55 + security/selinux/include/classmap.h | 2 + 6 files changed, 178 insertions(+), 4 deletions(-) -- 2.41.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v4 4/4] vduse: Add LSM hooks to check Virtio device type
This patch introduces LSM hooks for devices creation, destruction and opening operations, checking the application is allowed to perform these operations for the Virtio device type. Signed-off-by: Maxime Coquelin --- drivers/vdpa/vdpa_user/vduse_dev.c | 12 +++ include/linux/lsm_hook_defs.h | 4 +++ include/linux/security.h| 15 security/security.c | 42 ++ security/selinux/hooks.c| 55 + security/selinux/include/classmap.h | 2 ++ 6 files changed, 130 insertions(+) diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index 0243dee9cf0e..ca64eac11ddb 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -8,6 +8,7 @@ * */ +#include "linux/security.h" #include #include #include @@ -1443,6 +1444,10 @@ static int vduse_dev_open(struct inode *inode, struct file *file) if (dev->connected) goto unlock; + ret = -EPERM; + if (security_vduse_dev_open(dev->device_id)) + goto unlock; + ret = 0; dev->connected = true; file->private_data = dev; @@ -1655,6 +1660,9 @@ static int vduse_destroy_dev(char *name) if (!dev) return -EINVAL; + if (security_vduse_dev_destroy(dev->device_id)) + return -EPERM; + mutex_lock(&dev->lock); if (dev->vdev || dev->connected) { mutex_unlock(&dev->lock); @@ -1819,6 +1827,10 @@ static int vduse_create_dev(struct vduse_dev_config *config, int ret; struct vduse_dev *dev; + ret = -EPERM; + if (security_vduse_dev_create(config->device_id)) + goto err; + ret = -EEXIST; if (vduse_find_dev(config->name)) goto err; diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h index ac962c4cb44b..0b3999ab3264 100644 --- a/include/linux/lsm_hook_defs.h +++ b/include/linux/lsm_hook_defs.h @@ -419,3 +419,7 @@ LSM_HOOK(int, 0, uring_override_creds, const struct cred *new) LSM_HOOK(int, 0, uring_sqpoll, void) LSM_HOOK(int, 0, uring_cmd, struct io_uring_cmd *ioucmd) #endif /* CONFIG_IO_URING */ + +LSM_HOOK(int, 0, vduse_dev_create, u32 device_id) +LSM_HOOK(int, 0, vduse_dev_destroy, u32 device_id) +LSM_HOOK(int, 0, vduse_dev_open, u32 device_id) diff --git a/include/linux/security.h b/include/linux/security.h index 5f16eecde00b..a650c500f841 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -484,6 +484,9 @@ int security_inode_notifysecctx(struct inode *inode, void *ctx, u32 ctxlen); int security_inode_setsecctx(struct dentry *dentry, void *ctx, u32 ctxlen); int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen); int security_locked_down(enum lockdown_reason what); +int security_vduse_dev_create(u32 device_id); +int security_vduse_dev_destroy(u32 device_id); +int security_vduse_dev_open(u32 device_id); #else /* CONFIG_SECURITY */ static inline int call_blocking_lsm_notifier(enum lsm_event event, void *data) @@ -1395,6 +1398,18 @@ static inline int security_locked_down(enum lockdown_reason what) { return 0; } +static inline int security_vduse_dev_create(u32 device_id) +{ + return 0; +} +static inline int security_vduse_dev_destroy(u32 device_id) +{ + return 0; +} +static inline int security_vduse_dev_open(u32 device_id) +{ + return 0; +} #endif /* CONFIG_SECURITY */ #if defined(CONFIG_SECURITY) && defined(CONFIG_WATCH_QUEUE) diff --git a/security/security.c b/security/security.c index 23b129d482a7..8d7d4d2eca0b 100644 --- a/security/security.c +++ b/security/security.c @@ -5337,3 +5337,45 @@ int security_uring_cmd(struct io_uring_cmd *ioucmd) return call_int_hook(uring_cmd, 0, ioucmd); } #endif /* CONFIG_IO_URING */ + +/** + * security_vduse_dev_create() - Check if a VDUSE device type creation is allowed + * @device_id: the Virtio device ID + * + * Check whether the Virtio device creation is allowed + * + * Return: Returns 0 if permission is granted. + */ +int security_vduse_dev_create(u32 device_id) +{ + return call_int_hook(vduse_dev_create, 0, device_id); +} +EXPORT_SYMBOL(security_vduse_dev_create); + +/** + * security_vduse_dev_destroy() - Check if a VDUSE device type destruction is allowed + * @device_id: the Virtio device ID + * + * Check whether the Virtio device destruction is allowed + * + * Return: Returns 0 if permission is granted. + */ +int security_vduse_dev_destroy(u32 device_id) +{ + return call_int_hook(vduse_dev_destroy, 0, device_id); +} +EXPORT_SYMBOL(security_vduse_dev_destroy); + +/** + * security_vduse_dev_open() - Check if a VDUSE device type opening is allowed + * @device_id: the Virtio device ID + * + * Check whether the Virtio device opening is allowed + * + * Return: Returns 0
[PATCH v4 1/4] vduse: validate block features only with block devices
This patch is preliminary work to enable network device type support to VDUSE. As VIRTIO_BLK_F_CONFIG_WCE shares the same value as VIRTIO_NET_F_HOST_TSO4, we need to restrict its check to Virtio-blk device type. Acked-by: Jason Wang Reviewed-by: Xie Yongji Signed-off-by: Maxime Coquelin --- drivers/vdpa/vdpa_user/vduse_dev.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index df7869537ef1..5b3879976b3d 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -1662,13 +1662,14 @@ static bool device_is_allowed(u32 device_id) return false; } -static bool features_is_valid(u64 features) +static bool features_is_valid(struct vduse_dev_config *config) { - if (!(features & (1ULL << VIRTIO_F_ACCESS_PLATFORM))) + if (!(config->features & (1ULL << VIRTIO_F_ACCESS_PLATFORM))) return false; /* Now we only support read-only configuration space */ - if (features & (1ULL << VIRTIO_BLK_F_CONFIG_WCE)) + if ((config->device_id == VIRTIO_ID_BLOCK) && + (config->features & (1ULL << VIRTIO_BLK_F_CONFIG_WCE))) return false; return true; @@ -1695,7 +1696,7 @@ static bool vduse_validate_config(struct vduse_dev_config *config) if (!device_is_allowed(config->device_id)) return false; - if (!features_is_valid(config->features)) + if (!features_is_valid(config)) return false; return true; -- 2.41.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 4/4] vduse: Add LSM hooks to check Virtio device type
On 10/21/23 00:20, Casey Schaufler wrote: On 10/20/2023 8:58 AM, Maxime Coquelin wrote: This patch introduces LSM hooks for devices creation, destruction and opening operations, checking the application is allowed to perform these operations for the Virtio device type. Why do you think that there needs to be a special LSM check for virtio devices? What can't existing device attributes be used? Michael asked for a way for SELinux to allow/prevent the creation of some types of devices [0]. A device is created using ioctl() on VDUSE control chardev. Its type is specified via a field in the structure passed in argument. I didn't see other way than adding dedicated LSM hooks to achieve this, but it is possible that their is a better way to do it? Thanks, Maxime [0]: https://lore.kernel.org/all/20230829130430-mutt-send-email-...@kernel.org/ ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 1/4] vduse: validate block features only with block devices
On 10/21/23 00:07, Casey Schaufler wrote: On 10/20/2023 8:58 AM, Maxime Coquelin wrote: This patch is preliminary work to enable network device type support to VDUSE. As VIRTIO_BLK_F_CONFIG_WCE shares the same value as VIRTIO_NET_F_HOST_TSO4, we need to restrict its check to Virtio-blk device type. Acked-by: Jason Wang Reviewed-by: Xie Yongji Signed-off-by: Maxime Coquelin --- drivers/vdpa/vdpa_user/vduse_dev.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index df7869537ef1..5b3879976b3d 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -1662,13 +1662,14 @@ static bool device_is_allowed(u32 device_id) return false; } -static bool features_is_valid(u64 features) +static bool features_is_valid(struct vduse_dev_config *config) This should either be features_are_valid() or feature_is_valid(). Correct pluralization is important in the English language. Indeed, I will change to features_are_valid() in next revision. Thanks, Maxime { - if (!(features & (1ULL << VIRTIO_F_ACCESS_PLATFORM))) + if (!(config->features & (1ULL << VIRTIO_F_ACCESS_PLATFORM))) return false; /* Now we only support read-only configuration space */ - if (features & (1ULL << VIRTIO_BLK_F_CONFIG_WCE)) + if ((config->device_id == VIRTIO_ID_BLOCK) && + (config->features & (1ULL << VIRTIO_BLK_F_CONFIG_WCE))) return false; return true; @@ -1695,7 +1696,7 @@ static bool vduse_validate_config(struct vduse_dev_config *config) if (!device_is_allowed(config->device_id)) return false; - if (!features_is_valid(config->features)) + if (!features_is_valid(config)) return false; return true; ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 3/4] vduse: Temporarily disable control queue features
On 10/23/23 05:08, Jason Wang wrote: On Fri, Oct 20, 2023 at 11:58 PM Maxime Coquelin wrote: Virtio-net driver control queue implementation is not safe when used with VDUSE. If the VDUSE application does not reply to control queue messages, it currently ends up hanging the kernel thread sending this command. Some work is on-going to make the control queue implementation robust with VDUSE. Until it is completed, let's disable control virtqueue and features that depend on it. Signed-off-by: Maxime Coquelin I wonder if it's better to do this with patch 2 or before patch 2 to unbreak the bisection? I think it would be better to keep it in a dedicated patch to ease the revert later when your work will have been accepted, so before patch 2. Thanks, Maxime Thanks --- drivers/vdpa/vdpa_user/vduse_dev.c | 37 ++ 1 file changed, 37 insertions(+) diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index 73ad3b7efd8e..0243dee9cf0e 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -28,6 +28,7 @@ #include #include #include +#include #include #include "iova_domain.h" @@ -46,6 +47,30 @@ #define IRQ_UNBOUND -1 +#define VDUSE_NET_VALID_FEATURES_MASK \ + (BIT_ULL(VIRTIO_NET_F_CSUM) | \ +BIT_ULL(VIRTIO_NET_F_GUEST_CSUM) | \ +BIT_ULL(VIRTIO_NET_F_MTU) |\ +BIT_ULL(VIRTIO_NET_F_MAC) |\ +BIT_ULL(VIRTIO_NET_F_GUEST_TSO4) | \ +BIT_ULL(VIRTIO_NET_F_GUEST_TSO6) | \ +BIT_ULL(VIRTIO_NET_F_GUEST_ECN) | \ +BIT_ULL(VIRTIO_NET_F_GUEST_UFO) | \ +BIT_ULL(VIRTIO_NET_F_HOST_TSO4) | \ +BIT_ULL(VIRTIO_NET_F_HOST_TSO6) | \ +BIT_ULL(VIRTIO_NET_F_HOST_ECN) | \ +BIT_ULL(VIRTIO_NET_F_HOST_UFO) | \ +BIT_ULL(VIRTIO_NET_F_MRG_RXBUF) | \ +BIT_ULL(VIRTIO_NET_F_STATUS) | \ +BIT_ULL(VIRTIO_NET_F_HOST_USO) | \ +BIT_ULL(VIRTIO_F_ANY_LAYOUT) | \ +BIT_ULL(VIRTIO_RING_F_INDIRECT_DESC) | \ +BIT_ULL(VIRTIO_RING_F_EVENT_IDX) | \ +BIT_ULL(VIRTIO_F_VERSION_1) | \ +BIT_ULL(VIRTIO_F_ACCESS_PLATFORM) | \ +BIT_ULL(VIRTIO_F_RING_PACKED) |\ +BIT_ULL(VIRTIO_F_IN_ORDER)) + struct vduse_virtqueue { u16 index; u16 num_max; @@ -1778,6 +1803,16 @@ static struct attribute *vduse_dev_attrs[] = { ATTRIBUTE_GROUPS(vduse_dev); +static void vduse_dev_features_filter(struct vduse_dev_config *config) +{ + /* +* Temporarily filter out virtio-net's control virtqueue and features +* that depend on it while CVQ is being made more robust for VDUSE. +*/ + if (config->device_id == VIRTIO_ID_NET) + config->features &= VDUSE_NET_VALID_FEATURES_MASK; +} + static int vduse_create_dev(struct vduse_dev_config *config, void *config_buf, u64 api_version) { @@ -1793,6 +1828,8 @@ static int vduse_create_dev(struct vduse_dev_config *config, if (!dev) goto err; + vduse_dev_features_filter(config); + dev->api_version = api_version; dev->device_features = config->features; dev->device_id = config->device_id; -- 2.41.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 4/4] vduse: Add LSM hooks to check Virtio device type
On 10/23/23 17:13, Casey Schaufler wrote: On 10/23/2023 12:28 AM, Maxime Coquelin wrote: On 10/21/23 00:20, Casey Schaufler wrote: On 10/20/2023 8:58 AM, Maxime Coquelin wrote: This patch introduces LSM hooks for devices creation, destruction and opening operations, checking the application is allowed to perform these operations for the Virtio device type. Why do you think that there needs to be a special LSM check for virtio devices? What can't existing device attributes be used? Michael asked for a way for SELinux to allow/prevent the creation of some types of devices [0]. A device is created using ioctl() on VDUSE control chardev. Its type is specified via a field in the structure passed in argument. I didn't see other way than adding dedicated LSM hooks to achieve this, but it is possible that their is a better way to do it? At the very least the hook should be made more general, and I'd have to see a proposal before commenting on that. security_dev_destroy(dev) might be a better approach. If there's reason to control destruction of vduse devices it's reasonable to assume that there are other devices with the same or similar properties. VDUSE is different from other devices as the device is actually implemented by the user-space application, so this is very specific in my opinion. Since SELinux is your target use case, can you explain why you can't create SELinux policy to enforce the restrictions you're after? I believe (but can be proven wrong, of course) that SELinux has mechanism for dealing with controls on ioctls. I am not aware of such mechanism to deal with ioctl(), if you have a pointer that would be welcome. Thanks, Maxime Thanks, Maxime [0]: https://lore.kernel.org/all/20230829130430-mutt-send-email-...@kernel.org/ ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 4/4] vduse: Add LSM hooks to check Virtio device type
On 10/24/23 17:30, Casey Schaufler wrote: On 10/24/2023 2:49 AM, Maxime Coquelin wrote: On 10/23/23 17:13, Casey Schaufler wrote: On 10/23/2023 12:28 AM, Maxime Coquelin wrote: On 10/21/23 00:20, Casey Schaufler wrote: On 10/20/2023 8:58 AM, Maxime Coquelin wrote: This patch introduces LSM hooks for devices creation, destruction and opening operations, checking the application is allowed to perform these operations for the Virtio device type. Why do you think that there needs to be a special LSM check for virtio devices? What can't existing device attributes be used? Michael asked for a way for SELinux to allow/prevent the creation of some types of devices [0]. A device is created using ioctl() on VDUSE control chardev. Its type is specified via a field in the structure passed in argument. I didn't see other way than adding dedicated LSM hooks to achieve this, but it is possible that their is a better way to do it? At the very least the hook should be made more general, and I'd have to see a proposal before commenting on that. security_dev_destroy(dev) might be a better approach. If there's reason to control destruction of vduse devices it's reasonable to assume that there are other devices with the same or similar properties. VDUSE is different from other devices as the device is actually implemented by the user-space application, so this is very specific in my opinion. This is hardly unique. If you're implementing the device in user-space you may well be able to implement the desired controls there. Since SELinux is your target use case, can you explain why you can't create SELinux policy to enforce the restrictions you're after? I believe (but can be proven wrong, of course) that SELinux has mechanism for dealing with controls on ioctls. I am not aware of such mechanism to deal with ioctl(), if you have a pointer that would be welcome. security/selinux/hooks.c We might be able to extend selinux_file_ioctl(), but that will only covers the ioctl for the control file, this patch also adds hook for the device file opening that would need dedicated hook as the device type information is stored in the device's private data. Michael, before going further, I would be interested in your feedback. Was this patch what you had in mind when requesting for a way to allow/deny devices types for a given application? Regards, Maxime Thanks, Maxime Thanks, Maxime [0]: https://lore.kernel.org/all/20230829130430-mutt-send-email-...@kernel.org/ ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v4 4/4] vduse: Add LSM hooks to check Virtio device type
On 11/2/23 19:59, Michael S. Tsirkin wrote: On Thu, Nov 02, 2023 at 06:56:59PM +0100, Maxime Coquelin wrote: On 10/24/23 17:30, Casey Schaufler wrote: On 10/24/2023 2:49 AM, Maxime Coquelin wrote: On 10/23/23 17:13, Casey Schaufler wrote: On 10/23/2023 12:28 AM, Maxime Coquelin wrote: On 10/21/23 00:20, Casey Schaufler wrote: On 10/20/2023 8:58 AM, Maxime Coquelin wrote: This patch introduces LSM hooks for devices creation, destruction and opening operations, checking the application is allowed to perform these operations for the Virtio device type. Why do you think that there needs to be a special LSM check for virtio devices? What can't existing device attributes be used? Michael asked for a way for SELinux to allow/prevent the creation of some types of devices [0]. A device is created using ioctl() on VDUSE control chardev. Its type is specified via a field in the structure passed in argument. I didn't see other way than adding dedicated LSM hooks to achieve this, but it is possible that their is a better way to do it? At the very least the hook should be made more general, and I'd have to see a proposal before commenting on that. security_dev_destroy(dev) might be a better approach. If there's reason to control destruction of vduse devices it's reasonable to assume that there are other devices with the same or similar properties. VDUSE is different from other devices as the device is actually implemented by the user-space application, so this is very specific in my opinion. This is hardly unique. If you're implementing the device in user-space you may well be able to implement the desired controls there. Since SELinux is your target use case, can you explain why you can't create SELinux policy to enforce the restrictions you're after? I believe (but can be proven wrong, of course) that SELinux has mechanism for dealing with controls on ioctls. I am not aware of such mechanism to deal with ioctl(), if you have a pointer that would be welcome. security/selinux/hooks.c We might be able to extend selinux_file_ioctl(), but that will only covers the ioctl for the control file, this patch also adds hook for the device file opening that would need dedicated hook as the device type information is stored in the device's private data. Michael, before going further, I would be interested in your feedback. Was this patch what you had in mind when requesting for a way to allow/deny devices types for a given application? Regards, Maxime Yes, this is more or less what I had in mind. Great. Do you think we need to cover both ioctl() on the control file and open() on the device file, or only ioctl() is enough? If the former, we will need VDUSE-specific hooks. I may be able to improve my patch to have a single hook instead of 3 by passing the type of operation as an extra argument (create/destroy/open). If the latter, we may be able to extend the generic ioctl hook. Personally, I think it would make sense to also ensure a given application can only open existing VDUSE devices it supports. For example, openvswitch should only be allowed to open networking VDUSE devices. Thanks, Maxime Thanks, Maxime Thanks, Maxime [0]: https://lore.kernel.org/all/20230829130430-mutt-send-email-...@kernel.org/ ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next V2 0/2] virtio-net: don't busy poll for cvq command
Hi Jason, On 4/13/23 08:40, Jason Wang wrote: Hi all: The code used to busy poll for cvq command which turns out to have several side effects: 1) infinite poll for buggy devices 2) bad interaction with scheduler So this series tries to use sleep instead of busy polling. In this version, I take a step back: the hardening part is not implemented and leave for future investigation. We use to aggree to use interruptible sleep but it doesn't work for a general workqueue. Please review. Thanks for working on this. My DPDK VDUSE RFC missed to set the interrupt, as Xuan Zhou highlighted it makes the vdpa dev add/del commands to freeze: [<0>] device_del+0x37/0x3d0 [<0>] device_unregister+0x13/0x60 [<0>] unregister_virtio_device+0x11/0x20 [<0>] device_release_driver_internal+0x193/0x200 [<0>] bus_remove_device+0xbf/0x130 [<0>] device_del+0x174/0x3d0 [<0>] device_unregister+0x13/0x60 [<0>] vdpa_nl_cmd_dev_del_set_doit+0x66/0xe0 [vdpa] [<0>] genl_family_rcv_msg_doit.isra.0+0xb8/0x100 [<0>] genl_rcv_msg+0x151/0x290 [<0>] netlink_rcv_skb+0x54/0x100 [<0>] genl_rcv+0x24/0x40 [<0>] netlink_unicast+0x217/0x340 [<0>] netlink_sendmsg+0x23e/0x4a0 [<0>] sock_sendmsg+0x8f/0xa0 [<0>] __sys_sendto+0xfc/0x170 [<0>] __x64_sys_sendto+0x20/0x30 [<0>] do_syscall_64+0x59/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x72/0xdc Once fixed on DPDK side (you can use my vduse_v1 branch [0] for testing), it works fine: Tested-by: Maxime Coquelin For the potential missing interrupt with non-compliant devices, I guess it could be handled with the hardening work as same thing could happen if the VDUSE application crashed for example. Regards, Maxime [0]: Thanks Changes since V1: - use RTNL to synchronize rx mode worker - use completion for simplicity - don't try to harden CVQ command Changes since RFC: - switch to use BAD_RING in virtio_break_device() - check virtqueue_is_broken() after being woken up - use more_used() instead of virtqueue_get_buf() to allow caller to get buffers afterwards - break the virtio-net device when timeout - get buffer manually since the virtio core check more_used() instead Jason Wang (2): virtio-net: convert rx mode setting to use workqueue virtio-net: sleep instead of busy waiting for cvq command drivers/net/virtio_net.c | 76 ++-- 1 file changed, 66 insertions(+), 10 deletions(-) ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next V2 0/2] virtio-net: don't busy poll for cvq command
On 4/13/23 15:02, Maxime Coquelin wrote: Hi Jason, On 4/13/23 08:40, Jason Wang wrote: Hi all: The code used to busy poll for cvq command which turns out to have several side effects: 1) infinite poll for buggy devices 2) bad interaction with scheduler So this series tries to use sleep instead of busy polling. In this version, I take a step back: the hardening part is not implemented and leave for future investigation. We use to aggree to use interruptible sleep but it doesn't work for a general workqueue. Please review. Thanks for working on this. My DPDK VDUSE RFC missed to set the interrupt, as Xuan Zhou highlighted it makes the vdpa dev add/del commands to freeze: [<0>] device_del+0x37/0x3d0 [<0>] device_unregister+0x13/0x60 [<0>] unregister_virtio_device+0x11/0x20 [<0>] device_release_driver_internal+0x193/0x200 [<0>] bus_remove_device+0xbf/0x130 [<0>] device_del+0x174/0x3d0 [<0>] device_unregister+0x13/0x60 [<0>] vdpa_nl_cmd_dev_del_set_doit+0x66/0xe0 [vdpa] [<0>] genl_family_rcv_msg_doit.isra.0+0xb8/0x100 [<0>] genl_rcv_msg+0x151/0x290 [<0>] netlink_rcv_skb+0x54/0x100 [<0>] genl_rcv+0x24/0x40 [<0>] netlink_unicast+0x217/0x340 [<0>] netlink_sendmsg+0x23e/0x4a0 [<0>] sock_sendmsg+0x8f/0xa0 [<0>] __sys_sendto+0xfc/0x170 [<0>] __x64_sys_sendto+0x20/0x30 [<0>] do_syscall_64+0x59/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x72/0xdc Once fixed on DPDK side (you can use my vduse_v1 branch [0] for testing), it works fine: Tested-by: Maxime Coquelin For the potential missing interrupt with non-compliant devices, I guess it could be handled with the hardening work as same thing could happen if the VDUSE application crashed for example. Regards, Maxime [0]: Better with the link... [0]: https://gitlab.com/mcoquelin/dpdk-next-virtio/-/commits/vduse_v1/ Thanks Changes since V1: - use RTNL to synchronize rx mode worker - use completion for simplicity - don't try to harden CVQ command Changes since RFC: - switch to use BAD_RING in virtio_break_device() - check virtqueue_is_broken() after being woken up - use more_used() instead of virtqueue_get_buf() to allow caller to get buffers afterwards - break the virtio-net device when timeout - get buffer manually since the virtio core check more_used() instead Jason Wang (2): virtio-net: convert rx mode setting to use workqueue virtio-net: sleep instead of busy waiting for cvq command drivers/net/virtio_net.c | 76 ++-- 1 file changed, 66 insertions(+), 10 deletions(-) ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[RFC 2/2] vduse: enable Virtio-net device type
This patch adds Virtio-net device type to the supported devices types. Signed-off-by: Maxime Coquelin --- drivers/vdpa/vdpa_user/vduse_dev.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index 6fa598a03d8e..26b7e29cb900 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -131,6 +131,7 @@ static struct workqueue_struct *vduse_irq_wq; static u32 allowed_device_id[] = { VIRTIO_ID_BLOCK, + VIRTIO_ID_NET, }; static inline struct vduse_dev *vdpa_to_vduse(struct vdpa_device *vdpa) @@ -1738,6 +1739,7 @@ static const struct vdpa_mgmtdev_ops vdpa_dev_mgmtdev_ops = { static struct virtio_device_id id_table[] = { { VIRTIO_ID_BLOCK, VIRTIO_DEV_ANY_ID }, + { VIRTIO_ID_NET, VIRTIO_DEV_ANY_ID }, { 0 }, }; -- 2.39.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[RFC 0/2] vduse: add support for networking devices
This small series enables virtio-net device type in VDUSE. With it, basic operation have been tested, both with virtio-vdpa and vhost-vdpa using DPDK Vhost library series adding VDUSE support [0] using split rings layout. Control queue support (and so multiqueue) has also been tested, but require a Kernel series from Jason Wang relaxing control queue polling [1] to function reliably. Other than that, we have identified a few gaps: 1. Reconnection: a. VDUSE_VQ_GET_INFO ioctl() returns always 0 for avail index, even after the virtqueue has already been processed. Is that expected? I have tried instead to get the driver's avail index directly from the avail ring, but it does not seem reliable as I sometimes get "id %u is not a head!\n" warnings. Also such solution would not be possible with packed ring, as we need to know the wrap counters values. b. Missing IOCTLs: it would be handy to have new IOCTLs to query Virtio device status, and retrieve the config space set at VDUSE_CREATE_DEV time. 2. VDUSE application as non-root: We need to run the VDUSE application as non-root. There is some race between the time the UDEV rule is applied and the time the device starts being used. Discussing with Jason, he suggested we may have a VDUSE daemon run as root that would create the VDUSE device, manages its rights and then pass its file descriptor to the VDUSE app. However, with current IOCTLs, it means the VDUSE daemon would need to know several information that belongs to the VDUSE app implementing the device such as supported Virtio features, config space, etc... If we go that route, maybe we should have a control IOCTL to create the device which would just pass the device type. Then another device IOCTL to perform the initialization. Would that make sense? 3. Coredump: In order to be able to perform post-mortem analysis, DPDK Vhost library marks pages used for vrings and descriptors buffers as MADV_DODUMP using madvise(). However with VDUSE it fails with -EINVAL. My understanding is that we set VM_DONTEXPAND flag to the VMAs and madvise's MADV_DODUMP fails if it is present. I'm not sure to understand why madvise would prevent MADV_DODUMP if VM_DONTEXPAND is set. Any thoughts? [0]: https://patchwork.dpdk.org/project/dpdk/list/?series=27594&state=%2A&archive=both [1]: https://lore.kernel.org/lkml/cacgkmetgrxn3ppwsdo4oosnssljfembez0wvjgrr3whu+qa...@mail.gmail.com/T/ Maxime Coquelin (2): vduse: validate block features only with block devices vduse: enable Virtio-net device type drivers/vdpa/vdpa_user/vduse_dev.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) -- 2.39.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[RFC 1/2] vduse: validate block features only with block devices
This patch is preliminary work to enable network device type support to VDUSE. As VIRTIO_BLK_F_CONFIG_WCE shares the same value as VIRTIO_NET_F_HOST_TSO4, we need to restrict its check to Virtio-blk device type. Signed-off-by: Maxime Coquelin --- drivers/vdpa/vdpa_user/vduse_dev.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index 0c3b48616a9f..6fa598a03d8e 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -1416,13 +1416,14 @@ static bool device_is_allowed(u32 device_id) return false; } -static bool features_is_valid(u64 features) +static bool features_is_valid(struct vduse_dev_config *config) { - if (!(features & (1ULL << VIRTIO_F_ACCESS_PLATFORM))) + if (!(config->features & (1ULL << VIRTIO_F_ACCESS_PLATFORM))) return false; /* Now we only support read-only configuration space */ - if (features & (1ULL << VIRTIO_BLK_F_CONFIG_WCE)) + if ((config->device_id == VIRTIO_ID_BLOCK) && + (config->features & (1ULL << VIRTIO_BLK_F_CONFIG_WCE))) return false; return true; @@ -1446,7 +1447,7 @@ static bool vduse_validate_config(struct vduse_dev_config *config) if (!device_is_allowed(config->device_id)) return false; - if (!features_is_valid(config->features)) + if (!features_is_valid(config)) return false; return true; -- 2.39.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC 1/2] vduse: validate block features only with block devices
On 4/20/23 06:06, Jason Wang wrote: On Wed, Apr 19, 2023 at 9:43 PM Maxime Coquelin wrote: This patch is preliminary work to enable network device type support to VDUSE. As VIRTIO_BLK_F_CONFIG_WCE shares the same value as VIRTIO_NET_F_HOST_TSO4, we need to restrict its check to Virtio-blk device type. Signed-off-by: Maxime Coquelin --- drivers/vdpa/vdpa_user/vduse_dev.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index 0c3b48616a9f..6fa598a03d8e 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -1416,13 +1416,14 @@ static bool device_is_allowed(u32 device_id) return false; } -static bool features_is_valid(u64 features) +static bool features_is_valid(struct vduse_dev_config *config) { - if (!(features & (1ULL << VIRTIO_F_ACCESS_PLATFORM))) + if (!(config->features & (1ULL << VIRTIO_F_ACCESS_PLATFORM))) return false; /* Now we only support read-only configuration space */ - if (features & (1ULL << VIRTIO_BLK_F_CONFIG_WCE)) + if ((config->device_id == VIRTIO_ID_BLOCK) && + (config->features & (1ULL << VIRTIO_BLK_F_CONFIG_WCE))) The reason we filter WCE out is to avoid writable config space which might block the driver with a buggy userspace. For networking, I guess we should fail if VERSION_1 is not negotiated, then we can avoid setting mac addresses via the config space. Ok, I will add it to patch 2 in V1. Thanks, Maxime Thanks return false; return true; @@ -1446,7 +1447,7 @@ static bool vduse_validate_config(struct vduse_dev_config *config) if (!device_is_allowed(config->device_id)) return false; - if (!features_is_valid(config->features)) + if (!features_is_valid(config)) return false; return true; -- 2.39.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC 0/2] vduse: add support for networking devices
On 4/20/23 06:34, Jason Wang wrote: On Wed, Apr 19, 2023 at 9:43 PM Maxime Coquelin wrote: This small series enables virtio-net device type in VDUSE. With it, basic operation have been tested, both with virtio-vdpa and vhost-vdpa using DPDK Vhost library series adding VDUSE support [0] using split rings layout. Control queue support (and so multiqueue) has also been tested, but require a Kernel series from Jason Wang relaxing control queue polling [1] to function reliably. Other than that, we have identified a few gaps: 1. Reconnection: a. VDUSE_VQ_GET_INFO ioctl() returns always 0 for avail index, even after the virtqueue has already been processed. Is that expected? I have tried instead to get the driver's avail index directly from the avail ring, but it does not seem reliable as I sometimes get "id %u is not a head!\n" warnings. Also such solution would not be possible with packed ring, as we need to know the wrap counters values. Looking at the codes, it only returns the value that is set via set_vq_state(). I think it is expected to be called before the datapath runs. So when bound to virtio-vdpa, it is expected to return 0. But we need to fix the packed virtqueue case, I wonder if we need to call set_vq_state() explicitly in virtio-vdpa before starting the device. When bound to vhost-vdpa, Qemu will call VHOST_SET_VRING_BASE which will end up a call to set_vq_state(). Unfortunately, it doesn't support packed ring which needs some extension. b. Missing IOCTLs: it would be handy to have new IOCTLs to query Virtio device status, What's the use case of this ioctl? It looks to me userspace is notified on each status change now: static int vduse_dev_set_status(struct vduse_dev *dev, u8 status) { struct vduse_dev_msg msg = { 0 }; msg.req.type = VDUSE_SET_STATUS; msg.req.s.status = status; return vduse_dev_msg_sync(dev, &msg); } The idea was to be able to query the status at reconnect time, and neither having to assume its value nor having to store its value in a file (the status could change while the VDUSE application is stopped, but maybe it would receive the notification at reconnect). I will prototype using a tmpfs file to save needed information, and see if it works. and retrieve the config space set at VDUSE_CREATE_DEV time. In order to be safe, VDUSE avoids writable config space. Otherwise drivers could block on config writing forever. That's why we don't do it now. The idea was not to make the config space writable, but just to be able to fetch what was filled at VDUSE_CREATE_DEV time. With the tmpfs file, we can avoid doing that and just save the config space there. We need to harden the config write before we can proceed to this I think. 2. VDUSE application as non-root: We need to run the VDUSE application as non-root. There is some race between the time the UDEV rule is applied and the time the device starts being used. Discussing with Jason, he suggested we may have a VDUSE daemon run as root that would create the VDUSE device, manages its rights and then pass its file descriptor to the VDUSE app. However, with current IOCTLs, it means the VDUSE daemon would need to know several information that belongs to the VDUSE app implementing the device such as supported Virtio features, config space, etc... If we go that route, maybe we should have a control IOCTL to create the device which would just pass the device type. Then another device IOCTL to perform the initialization. Would that make sense? I think so. We can hear from others. 3. Coredump: In order to be able to perform post-mortem analysis, DPDK Vhost library marks pages used for vrings and descriptors buffers as MADV_DODUMP using madvise(). However with VDUSE it fails with -EINVAL. My understanding is that we set VM_DONTEXPAND flag to the VMAs and madvise's MADV_DODUMP fails if it is present. I'm not sure to understand why madvise would prevent MADV_DODUMP if VM_DONTEXPAND is set. Any thoughts? Adding Peter who may know the answer. Thanks! Maxime Thanks [0]: https://patchwork.dpdk.org/project/dpdk/list/?series=27594&state=%2A&archive=both [1]: https://lore.kernel.org/lkml/cacgkmetgrxn3ppwsdo4oosnssljfembez0wvjgrr3whu+qa...@mail.gmail.com/T/ Maxime Coquelin (2): vduse: validate block features only with block devices vduse: enable Virtio-net device type drivers/vdpa/vdpa_user/vduse_dev.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) -- 2.39.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC 0/2] vduse: add support for networking devices
On 4/20/23 10:13, Yongji Xie wrote: On Wed, Apr 19, 2023 at 9:44 PM Maxime Coquelin wrote: This small series enables virtio-net device type in VDUSE. With it, basic operation have been tested, both with virtio-vdpa and vhost-vdpa using DPDK Vhost library series adding VDUSE support [0] using split rings layout. Control queue support (and so multiqueue) has also been tested, but require a Kernel series from Jason Wang relaxing control queue polling [1] to function reliably. Other than that, we have identified a few gaps: 1. Reconnection: a. VDUSE_VQ_GET_INFO ioctl() returns always 0 for avail index, even after the virtqueue has already been processed. Is that expected? I have tried instead to get the driver's avail index directly from the avail ring, but it does not seem reliable as I sometimes get "id %u is not a head!\n" warnings. Also such solution would not be possible with packed ring, as we need to know the wrap counters values. I'm not sure how to handle the reconnection in the vhost-user-net case. Can we use a tmpfs file to track inflight I/O like this [1] We don't have inflight IOs with DPDK Vhsot library for net devices. But yes, a solution is to have a tmpfs file to save needed data. Advantage of this solution is it makes it possible to reconnect with packed ring in case of application crash, as the wrap counter values would not be lost. [1] https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#inflight-i-o-tracking b. Missing IOCTLs: it would be handy to have new IOCTLs to query Virtio device status, and retrieve the config space set at VDUSE_CREATE_DEV time. VDUSE_GET_STATUS ioctl might be needed. Or can we use a tmpfs file to save/restore that info. 2. VDUSE application as non-root: We need to run the VDUSE application as non-root. There is some race between the time the UDEV rule is applied and the time the device starts being used. Discussing with Jason, he suggested we may have a VDUSE daemon run as root that would create the VDUSE device, manages its rights and then pass its file descriptor to the VDUSE app. However, with current IOCTLs, it means the VDUSE daemon would need to know several information that belongs to the VDUSE app implementing the device such as supported Virtio features, config space, etc... If we go that route, maybe we should have a control IOCTL to create the device which would just pass the device type. Then another device IOCTL to perform the initialization. Would that make sense? I think we can reuse the VDUSE_CREATE_DEV ioctl (just use name, device_id and vendor_id) for control device here, and add a new ioctl VDUSE_DEV_SETUP to do device initialization. OK. If we do that, could we also provide the possibility to pass an UUID at VDUSE_DEV_SETUP time? It could be useful if we save information in a tmpfs file, in order to be able at reconnect time to ensure the tmpfs file UUID matches with the VDUSE device UUID, and so avoid restoring a leftover tmpfs file of a previously detroyed then re-created VDUSE device. Would that make sense? Regards, Maxime Thanks, Yongji ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC 0/2] vduse: add support for networking devices
On 4/21/23 07:51, Jason Wang wrote: On Thu, Apr 20, 2023 at 10:16 PM Maxime Coquelin wrote: On 4/20/23 06:34, Jason Wang wrote: On Wed, Apr 19, 2023 at 9:43 PM Maxime Coquelin wrote: This small series enables virtio-net device type in VDUSE. With it, basic operation have been tested, both with virtio-vdpa and vhost-vdpa using DPDK Vhost library series adding VDUSE support [0] using split rings layout. Control queue support (and so multiqueue) has also been tested, but require a Kernel series from Jason Wang relaxing control queue polling [1] to function reliably. Other than that, we have identified a few gaps: 1. Reconnection: a. VDUSE_VQ_GET_INFO ioctl() returns always 0 for avail index, even after the virtqueue has already been processed. Is that expected? I have tried instead to get the driver's avail index directly from the avail ring, but it does not seem reliable as I sometimes get "id %u is not a head!\n" warnings. Also such solution would not be possible with packed ring, as we need to know the wrap counters values. Looking at the codes, it only returns the value that is set via set_vq_state(). I think it is expected to be called before the datapath runs. So when bound to virtio-vdpa, it is expected to return 0. But we need to fix the packed virtqueue case, I wonder if we need to call set_vq_state() explicitly in virtio-vdpa before starting the device. When bound to vhost-vdpa, Qemu will call VHOST_SET_VRING_BASE which will end up a call to set_vq_state(). Unfortunately, it doesn't support packed ring which needs some extension. b. Missing IOCTLs: it would be handy to have new IOCTLs to query Virtio device status, What's the use case of this ioctl? It looks to me userspace is notified on each status change now: static int vduse_dev_set_status(struct vduse_dev *dev, u8 status) { struct vduse_dev_msg msg = { 0 }; msg.req.type = VDUSE_SET_STATUS; msg.req.s.status = status; return vduse_dev_msg_sync(dev, &msg); } The idea was to be able to query the status at reconnect time, and neither having to assume its value nor having to store its value in a file (the status could change while the VDUSE application is stopped, but maybe it would receive the notification at reconnect). I see. I will prototype using a tmpfs file to save needed information, and see if it works. It might work but then the API is not self contained. Maybe it's better to have a dedicated ioctl. and retrieve the config space set at VDUSE_CREATE_DEV time. In order to be safe, VDUSE avoids writable config space. Otherwise drivers could block on config writing forever. That's why we don't do it now. The idea was not to make the config space writable, but just to be able to fetch what was filled at VDUSE_CREATE_DEV time. With the tmpfs file, we can avoid doing that and just save the config space there. Same as the case for status. I have cooked a DPDK patch to support reconnect with a tmpfs file as suggested by Yongji: https://gitlab.com/mcoquelin/dpdk-next-virtio/-/commit/53913f2b1155b02c44d5d3d298aafd357e7a8c48 That's still rough around the edges, but it seems to work reliably for the testing I have done so far. We'll certainly want to use the tmpfs memory to directly store available indexes and wrap counters to avoid introducing overhead in the datapath. The tricky part will be to manage NUMA affinity. Regards, Maxime Thanks We need to harden the config write before we can proceed to this I think. 2. VDUSE application as non-root: We need to run the VDUSE application as non-root. There is some race between the time the UDEV rule is applied and the time the device starts being used. Discussing with Jason, he suggested we may have a VDUSE daemon run as root that would create the VDUSE device, manages its rights and then pass its file descriptor to the VDUSE app. However, with current IOCTLs, it means the VDUSE daemon would need to know several information that belongs to the VDUSE app implementing the device such as supported Virtio features, config space, etc... If we go that route, maybe we should have a control IOCTL to create the device which would just pass the device type. Then another device IOCTL to perform the initialization. Would that make sense? I think so. We can hear from others. 3. Coredump: In order to be able to perform post-mortem analysis, DPDK Vhost library marks pages used for vrings and descriptors buffers as MADV_DODUMP using madvise(). However with VDUSE it fails with -EINVAL. My understanding is that we set VM_DONTEXPAND flag to the VMAs and madvise's MADV_DODUMP fails if it is present. I'm not sure to understand why madvise would prevent MADV_DODUMP if VM_DONTEXPAND is set. Any thoughts? Addin
[PATCH] vduse: prevent uninitialized memory accesses
If the VDUSE application provides a smaller config space than the driver expects, the driver may use uninitialized memory from the stack. This patch prevents it by initializing the buffer passed by the driver to store the config value. Signed-off-by: Maxime Coquelin --- drivers/vdpa/vdpa_user/vduse_dev.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index 41c0b29739f1..35dceee3ed56 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -673,10 +673,15 @@ static void vduse_vdpa_get_config(struct vdpa_device *vdpa, unsigned int offset, { struct vduse_dev *dev = vdpa_to_vduse(vdpa); - if (offset > dev->config_size || - len > dev->config_size - offset) + /* Initialize the buffer in case of partial copy. */ + memset(buf, 0, len); + + if (offset > dev->config_size) return; + if (len > dev->config_size - offset) + len = dev->config_size - offset; + memcpy(buf, dev->config + offset, len); } -- 2.37.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] vduse: prevent uninitialized memory accesses
On 8/27/22 08:54, Dan Carpenter wrote: On Fri, Aug 26, 2022 at 06:16:05PM +0200, Maxime Coquelin wrote: If the VDUSE application provides a smaller config space than the driver expects, the driver may use uninitialized memory from the stack. This patch prevents it by initializing the buffer passed by the driver to store the config value. Signed-off-by: Maxime Coquelin This sounds like it needs a Fixes tag? Yes, I actually did it, but somehow forgot to generate the patch bedore posting. I'll post a v2 soon. Thanks, Maxime regards, dan carpenter ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v2] vduse: prevent uninitialized memory accesses
If the VDUSE application provides a smaller config space than the driver expects, the driver may use uninitialized memory from the stack. This patch prevents it by initializing the buffer passed by the driver to store the config value. This fix addresses CVE-2022-2308. Cc: xieyon...@bytedance.com Cc: sta...@vger.kernel.org # v5.15+ Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace") Acked-by: Jason Wang Signed-off-by: Maxime Coquelin --- drivers/vdpa/vdpa_user/vduse_dev.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index 41c0b29739f1..35dceee3ed56 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -673,10 +673,15 @@ static void vduse_vdpa_get_config(struct vdpa_device *vdpa, unsigned int offset, { struct vduse_dev *dev = vdpa_to_vduse(vdpa); - if (offset > dev->config_size || - len > dev->config_size - offset) + /* Initialize the buffer in case of partial copy. */ + memset(buf, 0, len); + + if (offset > dev->config_size) return; + if (len > dev->config_size - offset) + len = dev->config_size - offset; + memcpy(buf, dev->config + offset, len); } -- 2.37.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2] vduse: prevent uninitialized memory accesses
On 8/29/22 09:48, Greg KH wrote: On Mon, Aug 29, 2022 at 09:34:24AM +0200, Maxime Coquelin wrote: If the VDUSE application provides a smaller config space than the driver expects, the driver may use uninitialized memory from the stack. This patch prevents it by initializing the buffer passed by the driver to store the config value. This fix addresses CVE-2022-2308. Cc: xieyon...@bytedance.com Cc: sta...@vger.kernel.org # v5.15+ Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace") Acked-by: Jason Wang Signed-off-by: Maxime Coquelin Please no blank line above the Acked-by: line here if possible. Sure. Jason, do you prefer I post a new revision with this single change or you will handle it while applying? Either way is fine to me. Thanks, Maxime thanks, greg k-h ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2] vduse: prevent uninitialized memory accesses
On 8/31/22 17:46, Michael S. Tsirkin wrote: On Wed, Aug 31, 2022 at 05:01:11PM +0200, Maxime Coquelin wrote: On 8/29/22 09:48, Greg KH wrote: On Mon, Aug 29, 2022 at 09:34:24AM +0200, Maxime Coquelin wrote: If the VDUSE application provides a smaller config space than the driver expects, the driver may use uninitialized memory from the stack. This patch prevents it by initializing the buffer passed by the driver to store the config value. This fix addresses CVE-2022-2308. Cc: xieyon...@bytedance.com Cc: sta...@vger.kernel.org # v5.15+ Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace") Acked-by: Jason Wang Signed-off-by: Maxime Coquelin Please no blank line above the Acked-by: line here if possible. Sure. Jason, do you prefer I post a new revision with this single change or you will handle it while applying? Either way is fine to me. Thanks, Maxime I queue these normally. Ok, I'm preparing the V2. Thanks, Maxime thanks, greg k-h ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v3] vduse: prevent uninitialized memory accesses
If the VDUSE application provides a smaller config space than the driver expects, the driver may use uninitialized memory from the stack. This patch prevents it by initializing the buffer passed by the driver to store the config value. This fix addresses CVE-2022-2308. Cc: sta...@vger.kernel.org # v5.15+ Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace") Reviewed-by: Xie Yongji Acked-by: Jason Wang Signed-off-by: Maxime Coquelin --- drivers/vdpa/vdpa_user/vduse_dev.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index 41c0b29739f1..35dceee3ed56 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -673,10 +673,15 @@ static void vduse_vdpa_get_config(struct vdpa_device *vdpa, unsigned int offset, { struct vduse_dev *dev = vdpa_to_vduse(vdpa); - if (offset > dev->config_size || - len > dev->config_size - offset) + /* Initialize the buffer in case of partial copy. */ + memset(buf, 0, len); + + if (offset > dev->config_size) return; + if (len > dev->config_size - offset) + len = dev->config_size - offset; + memcpy(buf, dev->config + offset, len); } -- 2.37.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/2] vdpa: ifcvf: return err when fail to request config irq
Hi, On 7/23/20 11:12 AM, Jason Wang wrote: > We ignore the err of requesting config interrupt, fix this. > > Fixes: e7991f376a4d ("ifcvf: implement config interrupt in IFCVF") > Cc: Zhu Lingshan > Signed-off-by: Jason Wang > --- > drivers/vdpa/ifcvf/ifcvf_main.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c > index f5a60c14b979..ae7110955a44 100644 > --- a/drivers/vdpa/ifcvf/ifcvf_main.c > +++ b/drivers/vdpa/ifcvf/ifcvf_main.c > @@ -76,6 +76,10 @@ static int ifcvf_request_irq(struct ifcvf_adapter *adapter) > ret = devm_request_irq(&pdev->dev, irq, > ifcvf_config_changed, 0, > vf->config_msix_name, vf); > + if (ret) { > + IFCVF_ERR(pdev, "Failed to request config irq\n"); > + return ret; > + } > > for (i = 0; i < IFCVF_MAX_QUEUE_PAIRS * 2; i++) { > snprintf(vf->vring[i].msix_name, 256, "ifcvf[%s]-%d\n", > This series fixes below Kernel BUG I faced while doing some experiments: [ 1398.695362] kernel BUG at drivers/pci/msi.c:375! [ 1398.700561] invalid opcode: [#1] SMP PTI [ 1398.705423] CPU: 0 PID: 25110 Comm: dpdk-testpmd Not tainted 5.8.0-amorenoz-vdpa+ #2 [ 1398.714063] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 01/17/2017 [ 1398.722415] RIP: 0010:free_msi_irqs+0x189/0x1c0 [ 1398.727470] Code: 14 85 c0 0f 84 cc fe ff ff 31 ed eb 0f 83 c5 01 39 6b 14 0f 86 bc fe ff ff 8b 7b 10 01 ef e8 7e 94 b9 ff 48 83 78 70 00d [ 1398.748426] RSP: 0018:b48ac5dd3db8 EFLAGS: 00010286 [ 1398.754257] RAX: 9ab298b85400 RBX: 9ab285d97100 RCX: [ 1398.762219] RDX: RSI: 0073 RDI: ac65e8a0 [ 1398.770182] RBP: R08: 9ab298b85400 R09: 9ab74a8c3d98 [ 1398.778145] R10: 9ab74a8c3f58 R11: R12: 9ab719fd82e0 [ 1398.786107] R13: 9ab719fd8000 R14: 9ab719fd8000 R15: 9ab719fd80b0 [ 1398.794069] FS: 7efc5dea9000() GS:9ab75fc0() knlGS: [ 1398.803099] CS: 0010 DS: ES: CR0: 80050033 [ 1398.809508] CR2: 00c79ff8 CR3: 00038283e005 CR4: 003606f0 [ 1398.817471] DR0: DR1: DR2: [ 1398.825434] DR3: DR6: fffe0ff0 DR7: 0400 [ 1398.833394] Call Trace: [ 1398.836127] pci_disable_msix+0xf7/0x120 [ 1398.840504] pci_free_irq_vectors+0xe/0x20 [ 1398.845074] ifcvf_vdpa_set_status+0xda/0x301 [ 1398.849938] vhost_vdpa_unlocked_ioctl+0x61d/0x790 [ 1398.855277] ? vhost_vdpa_process_iotlb_msg+0x2f0/0x2f0 [ 1398.861109] ksys_ioctl+0x87/0xc0 [ 1398.864808] __x64_sys_ioctl+0x16/0x20 [ 1398.868992] do_syscall_64+0x52/0x90 [ 1398.872982] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1398.878610] RIP: 0033:0x7efc5df9ff9b [ 1398.882598] Code: 0f 1e fa 48 8b 05 ed ce 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 008 [ 1398.903551] RSP: 002b:7ffd0948e378 EFLAGS: 0283 ORIG_RAX: 0010 [ 1398.911998] RAX: ffda RBX: RCX: 7efc5df9ff9b [ 1398.919960] RDX: 7ffd0948e3d4 RSI: 4001af72 RDI: 002c [ 1398.927921] RBP: 7ffd0948e3c0 R08: 02651bf8 R09: [ 1398.935883] R10: 7ffd0948e417 R11: 0283 R12: 00408950 [ 1398.943845] R13: 7ffd0948e6a0 R14: R15: [ 1398.951809] Modules linked in: vxlan ip6_udp_tunnel udp_tunnel ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs xt_comment xt_mark nf_tables xt_nat vetht [ 1398.951847] ghash_clmulni_intel iTCO_vendor_support mlx5_core dcdbas rapl intel_cstate intel_uncore ipmi_ssif pcspkr mxm_wmi mlxfw virtii Tested-by: Maxime Coquelin Thanks, Maxime ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization