Re: [PATCH 09/16] vdpa/mlx5: Allow creation/deletion of any given mr struct
On Mon, 2023-10-09 at 14:39 +0800, Jason Wang wrote: > On Sun, Oct 8, 2023 at 8:05 PM Dragos Tatulea wrote: > > > > On Sun, 2023-10-08 at 12:25 +0800, Jason Wang wrote: > > > On Tue, Sep 26, 2023 at 3:21 PM Dragos Tatulea > > > wrote: > > > > > > > > On Tue, 2023-09-26 at 12:44 +0800, Jason Wang wrote: > > > > > On Tue, Sep 12, 2023 at 9:02 PM Dragos Tatulea > > > > > wrote: > > > > > > > > > > > > This patch adapts the mr creation/deletion code to be able to work > > > > > > with > > > > > > any given mr struct pointer. All the APIs are adapted to take an > > > > > > extra > > > > > > parameter for the mr. > > > > > > > > > > > > mlx5_vdpa_create/delete_mr doesn't need a ASID parameter anymore. > > > > > > The > > > > > > check is done in the caller instead (mlx5_set_map). > > > > > > > > > > > > This change is needed for a followup patch which will introduce an > > > > > > additional mr for the vq descriptor data. > > > > > > > > > > > > Signed-off-by: Dragos Tatulea > > > > > > --- > > > > > > > > > > Thinking of this decoupling I think I have a question. > > > > > > > > > > We advertise 2 address spaces and 2 groups. So we actually don't know > > > > > for example which address spaces will be used by dvq. > > > > > > > > > > And actually we allow the user space to do something like > > > > > > > > > > set_group_asid(dvq_group, 0) > > > > > set_map(0) > > > > > set_group_asid(dvq_group, 1) > > > > > set_map(1) > > > > > > > > > > I wonder if the decoupling like this patch can work and why. > > > > > > > > > This scenario could indeed work. Especially if you look at the 13'th > > > > patch > > > > [0] > > > > where hw support is added. Are you wondering if this should work at all > > > > or > > > > if it > > > > should be blocked? > > > > > > It would be great if it can work with the following patches. But at > > > least for this patch, it seems not: > > > > > > For example, what happens if we switch back to group 0 for dvq? > > > > > > set_group_asid(dvq_group, 0) > > > set_map(0) > > > set_group_asid(dvq_group, 1) > > > set_map(1) > > > // here we destroy the mr created for asid 0 > > > set_group_asid(dvq_group, 0) > > > > > If by destroy you mean .reset, > > It's not rest. During the second map, the mr is destroyed by > mlx5_vdpa_change_map(). > Oh, now I understand what you mean. This is not the case anymore. This patch series introduces one mr per asid. mlx5_vdpa_change_map will only destroy the old mr in the given asid. Before, there was one mr for all asids. > I think it works: During .reset the mapping in > > ASID 0 is reset back to the DMA/pysical map (mlx5_vdpa_create_dma_mr). Am I > > missing something? > > > > > Btw, if this is a new issue, I haven't checked whether or not it > > > exists before this series (if yes, we can fix on top). > > > > > > > > > > > It looks to me the most easy way is to let each AS be backed by an MR. > > > > > Then we don't even need to care about the dvq, cvq. > > > > That's what this patch series dowes. > > > > > > Good to know this, I will review the series. > > > > > I was planning to spin a v3 with Eugenio's suggestions. Should I wait for > > your > > feedback before doing that? > > You can post v3 and we can move the discussion there if you wish. > Ack. Thanks, Dragos ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature
On Sun, Oct 8, 2023 at 3:46 PM Akihiko Odaki wrote: > > On 2023/10/09 5:08, Willem de Bruijn wrote: > > On Sun, Oct 8, 2023 at 10:04 PM Akihiko Odaki > > wrote: > >> > >> On 2023/10/09 4:07, Willem de Bruijn wrote: > >>> On Sun, Oct 8, 2023 at 7:22 AM Akihiko Odaki > >>> wrote: > > virtio-net have two usage of hashes: one is RSS and another is hash > reporting. Conventionally the hash calculation was done by the VMM. > However, computing the hash after the queue was chosen defeats the > purpose of RSS. > > Another approach is to use eBPF steering program. This approach has > another downside: it cannot report the calculated hash due to the > restrictive nature of eBPF. > > Introduce the code to compute hashes to the kernel in order to overcome > thse challenges. An alternative solution is to extend the eBPF steering > program so that it will be able to report to the userspace, but it makes > little sense to allow to implement different hashing algorithms with > eBPF since the hash value reported by virtio-net is strictly defined by > the specification. > > The hash value already stored in sk_buff is not used and computed > independently since it may have been computed in a way not conformant > with the specification. > > Signed-off-by: Akihiko Odaki > --- > >>> > +static const struct tun_vnet_hash_cap tun_vnet_hash_cap = { > + .max_indirection_table_length = > + TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH, > + > + .types = VIRTIO_NET_SUPPORTED_HASH_TYPES > +}; > >>> > >>> No need to have explicit capabilities exchange like this? Tun either > >>> supports all or none. > >> > >> tun does not support VIRTIO_NET_RSS_HASH_TYPE_IP_EX, > >> VIRTIO_NET_RSS_HASH_TYPE_TCP_EX, and VIRTIO_NET_RSS_HASH_TYPE_UDP_EX. > >> > >> It is because the flow dissector does not support IPv6 extensions. The > >> specification is also vague, and does not tell how many TLVs should be > >> consumed at most when interpreting destination option header so I chose > >> to avoid adding code for these hash types to the flow dissector. I doubt > >> anyone will complain about it since nobody complains for Linux. > >> > >> I'm also adding this so that we can extend it later. > >> max_indirection_table_length may grow for systems with 128+ CPUs, or > >> types may have other bits for new protocols in the future. > >> > >>> > case TUNSETSTEERINGEBPF: > - ret = tun_set_ebpf(tun, &tun->steering_prog, argp); > + bpf_ret = tun_set_ebpf(tun, &tun->steering_prog, argp); > + if (IS_ERR(bpf_ret)) > + ret = PTR_ERR(bpf_ret); > + else if (bpf_ret) > + tun->vnet_hash.flags &= ~TUN_VNET_HASH_RSS; > >>> > >>> Don't make one feature disable another. > >>> > >>> TUNSETSTEERINGEBPF and TUNSETVNETHASH are mutually exclusive > >>> functions. If one is enabled the other call should fail, with EBUSY > >>> for instance. > >>> > + case TUNSETVNETHASH: > + len = sizeof(vnet_hash); > + if (copy_from_user(&vnet_hash, argp, len)) { > + ret = -EFAULT; > + break; > + } > + > + if (((vnet_hash.flags & TUN_VNET_HASH_REPORT) && > +(tun->vnet_hdr_sz < sizeof(struct > virtio_net_hdr_v1_hash) || > + !tun_is_little_endian(tun))) || > +vnet_hash.indirection_table_mask >= > +TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH) { > + ret = -EINVAL; > + break; > + } > + > + argp = (u8 __user *)argp + len; > + len = (vnet_hash.indirection_table_mask + 1) * 2; > + if (copy_from_user(vnet_hash_indirection_table, argp, > len)) { > + ret = -EFAULT; > + break; > + } > + > + argp = (u8 __user *)argp + len; > + len = virtio_net_hash_key_length(vnet_hash.types); > + > + if (copy_from_user(vnet_hash_key, argp, len)) { > + ret = -EFAULT; > + break; > + } > >>> > >>> Probably easier and less error-prone to define a fixed size control > >>> struct with the max indirection table size. > >> > >> I made its size variable because the indirection table and key may grow > >> in the future as I wrote above. > >> > >>> > >>> Btw: please trim the CC: list considerably on future patches. > >> > >> I'll do so in the next version with the TUNSETSTEERINGEBPF change you > >> proposed. > > > > To be clear: pleas
Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature
On Sun, Oct 8, 2023 at 12:22 AM Akihiko Odaki wrote: > > virtio-net have two usage of hashes: one is RSS and another is hash > reporting. Conventionally the hash calculation was done by the VMM. > However, computing the hash after the queue was chosen defeats the > purpose of RSS. > > Another approach is to use eBPF steering program. This approach has > another downside: it cannot report the calculated hash due to the > restrictive nature of eBPF. > > Introduce the code to compute hashes to the kernel in order to overcome > thse challenges. An alternative solution is to extend the eBPF steering > program so that it will be able to report to the userspace, but it makes > little sense to allow to implement different hashing algorithms with > eBPF since the hash value reported by virtio-net is strictly defined by > the specification. > > The hash value already stored in sk_buff is not used and computed > independently since it may have been computed in a way not conformant > with the specification. > > Signed-off-by: Akihiko Odaki > @@ -2116,31 +2172,49 @@ static ssize_t tun_put_user(struct tun_struct *tun, > } > > if (vnet_hdr_sz) { > - struct virtio_net_hdr gso; > + union { > + struct virtio_net_hdr hdr; > + struct virtio_net_hdr_v1_hash v1_hash_hdr; > + } hdr; > + int ret; > > if (iov_iter_count(iter) < vnet_hdr_sz) > return -EINVAL; > > - if (virtio_net_hdr_from_skb(skb, &gso, > - tun_is_little_endian(tun), true, > - vlan_hlen)) { > + if ((READ_ONCE(tun->vnet_hash.flags) & TUN_VNET_HASH_REPORT) > && > + vnet_hdr_sz >= sizeof(hdr.v1_hash_hdr) && > + skb->tun_vnet_hash) { Isn't vnet_hdr_sz guaranteed to be >= hdr.v1_hash_hdr, by virtue of the set hash ioctl failing otherwise? Such checks should be limited to control path where possible > + vnet_hdr_content_sz = sizeof(hdr.v1_hash_hdr); > + ret = virtio_net_hdr_v1_hash_from_skb(skb, > + > &hdr.v1_hash_hdr, > + true, > + vlan_hlen, > + &vnet_hash); > + } else { > + vnet_hdr_content_sz = sizeof(hdr.hdr); > + ret = virtio_net_hdr_from_skb(skb, &hdr.hdr, > + > tun_is_little_endian(tun), > + true, vlan_hlen); > + } > + ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature
On Mon, Oct 9, 2023 at 3:44 AM Akihiko Odaki wrote: > > On 2023/10/09 17:13, Willem de Bruijn wrote: > > On Sun, Oct 8, 2023 at 12:22 AM Akihiko Odaki > > wrote: > >> > >> virtio-net have two usage of hashes: one is RSS and another is hash > >> reporting. Conventionally the hash calculation was done by the VMM. > >> However, computing the hash after the queue was chosen defeats the > >> purpose of RSS. > >> > >> Another approach is to use eBPF steering program. This approach has > >> another downside: it cannot report the calculated hash due to the > >> restrictive nature of eBPF. > >> > >> Introduce the code to compute hashes to the kernel in order to overcome > >> thse challenges. An alternative solution is to extend the eBPF steering > >> program so that it will be able to report to the userspace, but it makes > >> little sense to allow to implement different hashing algorithms with > >> eBPF since the hash value reported by virtio-net is strictly defined by > >> the specification. > >> > >> The hash value already stored in sk_buff is not used and computed > >> independently since it may have been computed in a way not conformant > >> with the specification. > >> > >> Signed-off-by: Akihiko Odaki > > > >> @@ -2116,31 +2172,49 @@ static ssize_t tun_put_user(struct tun_struct *tun, > >> } > >> > >> if (vnet_hdr_sz) { > >> - struct virtio_net_hdr gso; > >> + union { > >> + struct virtio_net_hdr hdr; > >> + struct virtio_net_hdr_v1_hash v1_hash_hdr; > >> + } hdr; > >> + int ret; > >> > >> if (iov_iter_count(iter) < vnet_hdr_sz) > >> return -EINVAL; > >> > >> - if (virtio_net_hdr_from_skb(skb, &gso, > >> - tun_is_little_endian(tun), > >> true, > >> - vlan_hlen)) { > >> + if ((READ_ONCE(tun->vnet_hash.flags) & > >> TUN_VNET_HASH_REPORT) && > >> + vnet_hdr_sz >= sizeof(hdr.v1_hash_hdr) && > >> + skb->tun_vnet_hash) { > > > > Isn't vnet_hdr_sz guaranteed to be >= hdr.v1_hash_hdr, by virtue of > > the set hash ioctl failing otherwise? > > > > Such checks should be limited to control path where possible > > There is a potential race since tun->vnet_hash.flags and vnet_hdr_sz are > not read at once. It should not be possible to downgrade the hdr_sz once v1 is selected. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature
On Mon, Oct 9, 2023 at 3:57 AM Akihiko Odaki wrote: > > On 2023/10/09 17:04, Willem de Bruijn wrote: > > On Sun, Oct 8, 2023 at 3:46 PM Akihiko Odaki > > wrote: > >> > >> On 2023/10/09 5:08, Willem de Bruijn wrote: > >>> On Sun, Oct 8, 2023 at 10:04 PM Akihiko Odaki > >>> wrote: > > On 2023/10/09 4:07, Willem de Bruijn wrote: > > On Sun, Oct 8, 2023 at 7:22 AM Akihiko Odaki > > wrote: > >> > >> virtio-net have two usage of hashes: one is RSS and another is hash > >> reporting. Conventionally the hash calculation was done by the VMM. > >> However, computing the hash after the queue was chosen defeats the > >> purpose of RSS. > >> > >> Another approach is to use eBPF steering program. This approach has > >> another downside: it cannot report the calculated hash due to the > >> restrictive nature of eBPF. > >> > >> Introduce the code to compute hashes to the kernel in order to overcome > >> thse challenges. An alternative solution is to extend the eBPF steering > >> program so that it will be able to report to the userspace, but it > >> makes > >> little sense to allow to implement different hashing algorithms with > >> eBPF since the hash value reported by virtio-net is strictly defined by > >> the specification. > >> > >> The hash value already stored in sk_buff is not used and computed > >> independently since it may have been computed in a way not conformant > >> with the specification. > >> > >> Signed-off-by: Akihiko Odaki > >> --- > > > >> +static const struct tun_vnet_hash_cap tun_vnet_hash_cap = { > >> + .max_indirection_table_length = > >> + TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH, > >> + > >> + .types = VIRTIO_NET_SUPPORTED_HASH_TYPES > >> +}; > > > > No need to have explicit capabilities exchange like this? Tun either > > supports all or none. > > tun does not support VIRTIO_NET_RSS_HASH_TYPE_IP_EX, > VIRTIO_NET_RSS_HASH_TYPE_TCP_EX, and VIRTIO_NET_RSS_HASH_TYPE_UDP_EX. > > It is because the flow dissector does not support IPv6 extensions. The > specification is also vague, and does not tell how many TLVs should be > consumed at most when interpreting destination option header so I chose > to avoid adding code for these hash types to the flow dissector. I doubt > anyone will complain about it since nobody complains for Linux. > > I'm also adding this so that we can extend it later. > max_indirection_table_length may grow for systems with 128+ CPUs, or > types may have other bits for new protocols in the future. > > > > >>case TUNSETSTEERINGEBPF: > >> - ret = tun_set_ebpf(tun, &tun->steering_prog, argp); > >> + bpf_ret = tun_set_ebpf(tun, &tun->steering_prog, argp); > >> + if (IS_ERR(bpf_ret)) > >> + ret = PTR_ERR(bpf_ret); > >> + else if (bpf_ret) > >> + tun->vnet_hash.flags &= ~TUN_VNET_HASH_RSS; > > > > Don't make one feature disable another. > > > > TUNSETSTEERINGEBPF and TUNSETVNETHASH are mutually exclusive > > functions. If one is enabled the other call should fail, with EBUSY > > for instance. > > > >> + case TUNSETVNETHASH: > >> + len = sizeof(vnet_hash); > >> + if (copy_from_user(&vnet_hash, argp, len)) { > >> + ret = -EFAULT; > >> + break; > >> + } > >> + > >> + if (((vnet_hash.flags & TUN_VNET_HASH_REPORT) && > >> +(tun->vnet_hdr_sz < sizeof(struct > >> virtio_net_hdr_v1_hash) || > >> + !tun_is_little_endian(tun))) || > >> +vnet_hash.indirection_table_mask >= > >> +TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH) { > >> + ret = -EINVAL; > >> + break; > >> + } > >> + > >> + argp = (u8 __user *)argp + len; > >> + len = (vnet_hash.indirection_table_mask + 1) * 2; > >> + if (copy_from_user(vnet_hash_indirection_table, argp, > >> len)) { > >> + ret = -EFAULT; > >> + break; > >> + } > >> + > >> + argp = (u8 __user *)argp + len; > >> + len = virtio_net_hash_key_length(vnet_hash.types); > >> + > >> + if (copy_from_user(vnet_hash_key, argp, len)) { > >> + ret = -EFAULT; > >> + break; > >> + } > > > > Probably easier and less error-prone to define a fixed size control > > struct with the max indirecti
Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature
On Mon, Oct 9, 2023 at 3:02 AM Akihiko Odaki wrote: > > On 2023/10/09 18:57, Willem de Bruijn wrote: > > On Mon, Oct 9, 2023 at 3:57 AM Akihiko Odaki > > wrote: > >> > >> On 2023/10/09 17:04, Willem de Bruijn wrote: > >>> On Sun, Oct 8, 2023 at 3:46 PM Akihiko Odaki > >>> wrote: > > On 2023/10/09 5:08, Willem de Bruijn wrote: > > On Sun, Oct 8, 2023 at 10:04 PM Akihiko Odaki > > wrote: > >> > >> On 2023/10/09 4:07, Willem de Bruijn wrote: > >>> On Sun, Oct 8, 2023 at 7:22 AM Akihiko Odaki > >>> wrote: > > virtio-net have two usage of hashes: one is RSS and another is hash > reporting. Conventionally the hash calculation was done by the VMM. > However, computing the hash after the queue was chosen defeats the > purpose of RSS. > > Another approach is to use eBPF steering program. This approach has > another downside: it cannot report the calculated hash due to the > restrictive nature of eBPF. > > Introduce the code to compute hashes to the kernel in order to > overcome > thse challenges. An alternative solution is to extend the eBPF > steering > program so that it will be able to report to the userspace, but it > makes > little sense to allow to implement different hashing algorithms with > eBPF since the hash value reported by virtio-net is strictly defined > by > the specification. > > The hash value already stored in sk_buff is not used and computed > independently since it may have been computed in a way not conformant > with the specification. > > Signed-off-by: Akihiko Odaki > --- > >>> > +static const struct tun_vnet_hash_cap tun_vnet_hash_cap = { > + .max_indirection_table_length = > + TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH, > + > + .types = VIRTIO_NET_SUPPORTED_HASH_TYPES > +}; > >>> > >>> No need to have explicit capabilities exchange like this? Tun either > >>> supports all or none. > >> > >> tun does not support VIRTIO_NET_RSS_HASH_TYPE_IP_EX, > >> VIRTIO_NET_RSS_HASH_TYPE_TCP_EX, and VIRTIO_NET_RSS_HASH_TYPE_UDP_EX. > >> > >> It is because the flow dissector does not support IPv6 extensions. The > >> specification is also vague, and does not tell how many TLVs should be > >> consumed at most when interpreting destination option header so I chose > >> to avoid adding code for these hash types to the flow dissector. I > >> doubt > >> anyone will complain about it since nobody complains for Linux. > >> > >> I'm also adding this so that we can extend it later. > >> max_indirection_table_length may grow for systems with 128+ CPUs, or > >> types may have other bits for new protocols in the future. > >> > >>> > case TUNSETSTEERINGEBPF: > - ret = tun_set_ebpf(tun, &tun->steering_prog, argp); > + bpf_ret = tun_set_ebpf(tun, &tun->steering_prog, > argp); > + if (IS_ERR(bpf_ret)) > + ret = PTR_ERR(bpf_ret); > + else if (bpf_ret) > + tun->vnet_hash.flags &= ~TUN_VNET_HASH_RSS; > >>> > >>> Don't make one feature disable another. > >>> > >>> TUNSETSTEERINGEBPF and TUNSETVNETHASH are mutually exclusive > >>> functions. If one is enabled the other call should fail, with EBUSY > >>> for instance. > >>> > + case TUNSETVNETHASH: > + len = sizeof(vnet_hash); > + if (copy_from_user(&vnet_hash, argp, len)) { > + ret = -EFAULT; > + break; > + } > + > + if (((vnet_hash.flags & TUN_VNET_HASH_REPORT) && > +(tun->vnet_hdr_sz < sizeof(struct > virtio_net_hdr_v1_hash) || > + !tun_is_little_endian(tun))) || > +vnet_hash.indirection_table_mask >= > +TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH) { > + ret = -EINVAL; > + break; > + } > + > + argp = (u8 __user *)argp + len; > + len = (vnet_hash.indirection_table_mask + 1) * 2; > + if (copy_from_user(vnet_hash_indirection_table, > argp, len)) { > + ret = -EFAULT; > + break; > + } > + > + argp = (u8 __user *)argp + len; > +
Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature
On Mon, Oct 9, 2023 at 3:05 AM Akihiko Odaki wrote: > > > > On 2023/10/09 18:54, Willem de Bruijn wrote: > > On Mon, Oct 9, 2023 at 3:44 AM Akihiko Odaki > > wrote: > >> > >> On 2023/10/09 17:13, Willem de Bruijn wrote: > >>> On Sun, Oct 8, 2023 at 12:22 AM Akihiko Odaki > >>> wrote: > > virtio-net have two usage of hashes: one is RSS and another is hash > reporting. Conventionally the hash calculation was done by the VMM. > However, computing the hash after the queue was chosen defeats the > purpose of RSS. > > Another approach is to use eBPF steering program. This approach has > another downside: it cannot report the calculated hash due to the > restrictive nature of eBPF. > > Introduce the code to compute hashes to the kernel in order to overcome > thse challenges. An alternative solution is to extend the eBPF steering > program so that it will be able to report to the userspace, but it makes > little sense to allow to implement different hashing algorithms with > eBPF since the hash value reported by virtio-net is strictly defined by > the specification. > > The hash value already stored in sk_buff is not used and computed > independently since it may have been computed in a way not conformant > with the specification. > > Signed-off-by: Akihiko Odaki > >>> > @@ -2116,31 +2172,49 @@ static ssize_t tun_put_user(struct tun_struct > *tun, > } > > if (vnet_hdr_sz) { > - struct virtio_net_hdr gso; > + union { > + struct virtio_net_hdr hdr; > + struct virtio_net_hdr_v1_hash v1_hash_hdr; > + } hdr; > + int ret; > > if (iov_iter_count(iter) < vnet_hdr_sz) > return -EINVAL; > > - if (virtio_net_hdr_from_skb(skb, &gso, > - tun_is_little_endian(tun), > true, > - vlan_hlen)) { > + if ((READ_ONCE(tun->vnet_hash.flags) & > TUN_VNET_HASH_REPORT) && > + vnet_hdr_sz >= sizeof(hdr.v1_hash_hdr) && > + skb->tun_vnet_hash) { > >>> > >>> Isn't vnet_hdr_sz guaranteed to be >= hdr.v1_hash_hdr, by virtue of > >>> the set hash ioctl failing otherwise? > >>> > >>> Such checks should be limited to control path where possible > >> > >> There is a potential race since tun->vnet_hash.flags and vnet_hdr_sz are > >> not read at once. > > > > It should not be possible to downgrade the hdr_sz once v1 is selected. > > I see nothing that prevents shrinking the header size. > > tun->vnet_hash.flags is read after vnet_hdr_sz so the race can happen > even for the case the header size grows though this can be fixed by > reordering the two reads. One option is to fail any control path that tries to re-negotiate header size once this hash option is enabled? There is no practical reason to allow feature re-negotiation at any arbitrary time. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature
On Mon, Oct 9, 2023 at 3:11 AM Akihiko Odaki wrote: > > On 2023/10/09 19:07, Willem de Bruijn wrote: > > On Mon, Oct 9, 2023 at 3:05 AM Akihiko Odaki > > wrote: > >> > >> > >> > >> On 2023/10/09 18:54, Willem de Bruijn wrote: > >>> On Mon, Oct 9, 2023 at 3:44 AM Akihiko Odaki > >>> wrote: > > On 2023/10/09 17:13, Willem de Bruijn wrote: > > On Sun, Oct 8, 2023 at 12:22 AM Akihiko Odaki > > wrote: > >> > >> virtio-net have two usage of hashes: one is RSS and another is hash > >> reporting. Conventionally the hash calculation was done by the VMM. > >> However, computing the hash after the queue was chosen defeats the > >> purpose of RSS. > >> > >> Another approach is to use eBPF steering program. This approach has > >> another downside: it cannot report the calculated hash due to the > >> restrictive nature of eBPF. > >> > >> Introduce the code to compute hashes to the kernel in order to overcome > >> thse challenges. An alternative solution is to extend the eBPF steering > >> program so that it will be able to report to the userspace, but it > >> makes > >> little sense to allow to implement different hashing algorithms with > >> eBPF since the hash value reported by virtio-net is strictly defined by > >> the specification. > >> > >> The hash value already stored in sk_buff is not used and computed > >> independently since it may have been computed in a way not conformant > >> with the specification. > >> > >> Signed-off-by: Akihiko Odaki > > > >> @@ -2116,31 +2172,49 @@ static ssize_t tun_put_user(struct tun_struct > >> *tun, > >>} > >> > >>if (vnet_hdr_sz) { > >> - struct virtio_net_hdr gso; > >> + union { > >> + struct virtio_net_hdr hdr; > >> + struct virtio_net_hdr_v1_hash v1_hash_hdr; > >> + } hdr; > >> + int ret; > >> > >>if (iov_iter_count(iter) < vnet_hdr_sz) > >>return -EINVAL; > >> > >> - if (virtio_net_hdr_from_skb(skb, &gso, > >> - tun_is_little_endian(tun), > >> true, > >> - vlan_hlen)) { > >> + if ((READ_ONCE(tun->vnet_hash.flags) & > >> TUN_VNET_HASH_REPORT) && > >> + vnet_hdr_sz >= sizeof(hdr.v1_hash_hdr) && > >> + skb->tun_vnet_hash) { > > > > Isn't vnet_hdr_sz guaranteed to be >= hdr.v1_hash_hdr, by virtue of > > the set hash ioctl failing otherwise? > > > > Such checks should be limited to control path where possible > > There is a potential race since tun->vnet_hash.flags and vnet_hdr_sz are > not read at once. > >>> > >>> It should not be possible to downgrade the hdr_sz once v1 is selected. > >> > >> I see nothing that prevents shrinking the header size. > >> > >> tun->vnet_hash.flags is read after vnet_hdr_sz so the race can happen > >> even for the case the header size grows though this can be fixed by > >> reordering the two reads. > > > > One option is to fail any control path that tries to re-negotiate > > header size once this hash option is enabled? > > > > There is no practical reason to allow feature re-negotiation at any > > arbitrary time. > > I think it's a bit awkward interface design since tun allows to > reconfigure any of its parameters, but it's certainly possible. If this would be the only exception to that rule, and this is the only place that needs a datapath check, then it's fine to leave as is. In general, this runtime configurability serves little purpose but to help syzbot exercise code paths no real application would attempt. But I won't ask to diverge from whatever tun already does. We just have to be more careful about the possible races it brings. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH vhost v2 1/2] virtio_pci: fix the common map size and add check for common size
On Mon, Oct 09, 2023 at 09:15:31AM +0800, Xuan Zhuo wrote: > On Sun, 8 Oct 2023 06:42:37 -0400, "Michael S. Tsirkin" > wrote: > > On Sun, Oct 08, 2023 at 05:38:41PM +0800, Xuan Zhuo wrote: > > > Now, the function vp_modern_map_capability() takes the size parameter, > > > which corresponds to the size of virtio_pci_common_cfg. As a result, > > > this indicates the size of memory area to map. > > > > > > However, if the _F_RING_RESET is negotiated, the extra items will be > > > used. Therefore, we need to use the size of virtio_pci_modern_common_cfg > > > to map more space. > > > > > > Meanwhile, this patch removes the feature(_F_RING_ERSET and > > > > typo > > > > > _F_NOTIFICATION_DATA) when the common cfg size does not match > > > the corresponding feature. > > > > > > Fixes: 0b50cece0b78 ("virtio_pci: introduce helper to get/set queue > > > reset") > > > Signed-off-by: Xuan Zhuo > > > --- > > > drivers/virtio/virtio_pci_modern.c | 20 +++- > > > drivers/virtio/virtio_pci_modern_dev.c | 4 ++-- > > > include/linux/virtio_pci_modern.h | 1 + > > > 3 files changed, 22 insertions(+), 3 deletions(-) > > > > > > diff --git a/drivers/virtio/virtio_pci_modern.c > > > b/drivers/virtio/virtio_pci_modern.c > > > index d6bb68ba84e5..c0b9d2363ddb 100644 > > > --- a/drivers/virtio/virtio_pci_modern.c > > > +++ b/drivers/virtio/virtio_pci_modern.c > > > @@ -22,8 +22,26 @@ > > > static u64 vp_get_features(struct virtio_device *vdev) > > > { > > > struct virtio_pci_device *vp_dev = to_vp_device(vdev); > > > + u64 features = vp_modern_get_features(&vp_dev->mdev); > > > + > > > +#define check_feature(feature, field) > > > \ > > > + do { > > > \ > > > + if (features & BIT_ULL(feature)) { > > > \ > > > + u32 offset = offsetofend(struct > > > virtio_pci_modern_common_cfg, field); \ > > > + if (unlikely(vp_dev->mdev.common_len < offset)) > > > \ > > > + features &= ~BIT_ULL(feature); > > > \ > > > + } > > > \ > > > + } while (0) > > > + > > > + /* For buggy devices, if the common len does not match the feature, we > > > + * remove the feature. > > > > I don't like doing this in vp_get_features. userspace won't be able > > to see the actual device features at all. > > Also, we should print an info message at least. > > > > I am still debating with myself whether clearing feature bits > > or just failing finalize_features (and thus, probe) is best. > > For me, I think failing probe is best. > > Then the developer of the device can find that firstly. > And I think we should print an info message when we detect > this error. If you fail probe - maybe even a warning. > If we clear the feature bits, the developer of the device may > ignore this error. > > > > > > > > + */ > > > + check_feature(VIRTIO_F_NOTIFICATION_DATA, queue_notify_data); > > > + check_feature(VIRTIO_F_RING_RESET, queue_reset); > > > + > > > +#undef check_feature > > > > this macro's too scary. just pass offset and feature bit as > > parameters to an inline function. > > I should pass the features as a parameter. > > Thanks. > > > > > > > > > > > - return vp_modern_get_features(&vp_dev->mdev); > > > + return features; > > > } > > > > > > static void vp_transport_features(struct virtio_device *vdev, u64 > > > features) > > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c > > > b/drivers/virtio/virtio_pci_modern_dev.c > > > index aad7d9296e77..33f319da1558 100644 > > > --- a/drivers/virtio/virtio_pci_modern_dev.c > > > +++ b/drivers/virtio/virtio_pci_modern_dev.c > > > @@ -291,8 +291,8 @@ int vp_modern_probe(struct virtio_pci_modern_device > > > *mdev) > > > err = -EINVAL; > > > mdev->common = vp_modern_map_capability(mdev, common, > > > sizeof(struct virtio_pci_common_cfg), 4, > > > - 0, sizeof(struct virtio_pci_common_cfg), > > > - NULL, NULL); > > > + 0, sizeof(struct > > > virtio_pci_modern_common_cfg), > > > + &mdev->common_len, NULL); > > > if (!mdev->common) > > > goto err_map_common; > > > mdev->isr = vp_modern_map_capability(mdev, isr, sizeof(u8), 1, > > > diff --git a/include/linux/virtio_pci_modern.h > > > b/include/linux/virtio_pci_modern.h > > > index 067ac1d789bc..edf62bae0474 100644 > > > --- a/include/linux/virtio_pci_modern.h > > > +++ b/include/linux/virtio_pci_modern.h > > > @@ -28,6 +28,7 @@ struct virtio_pci_modern_device { > > > /* So we can sanity-check accesses. */ > > > size_t notify_len; > > > size_t device_len;
Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature
On Mon, Oct 9, 2023 at 3:12 AM Akihiko Odaki wrote: > > On 2023/10/09 19:06, Willem de Bruijn wrote: > > On Mon, Oct 9, 2023 at 3:02 AM Akihiko Odaki > > wrote: > >> > >> On 2023/10/09 18:57, Willem de Bruijn wrote: > >>> On Mon, Oct 9, 2023 at 3:57 AM Akihiko Odaki > >>> wrote: > > On 2023/10/09 17:04, Willem de Bruijn wrote: > > On Sun, Oct 8, 2023 at 3:46 PM Akihiko Odaki > > wrote: > >> > >> On 2023/10/09 5:08, Willem de Bruijn wrote: > >>> On Sun, Oct 8, 2023 at 10:04 PM Akihiko Odaki > >>> wrote: > > On 2023/10/09 4:07, Willem de Bruijn wrote: > > On Sun, Oct 8, 2023 at 7:22 AM Akihiko Odaki > > wrote: > >> > >> virtio-net have two usage of hashes: one is RSS and another is hash > >> reporting. Conventionally the hash calculation was done by the VMM. > >> However, computing the hash after the queue was chosen defeats the > >> purpose of RSS. > >> > >> Another approach is to use eBPF steering program. This approach has > >> another downside: it cannot report the calculated hash due to the > >> restrictive nature of eBPF. > >> > >> Introduce the code to compute hashes to the kernel in order to > >> overcome > >> thse challenges. An alternative solution is to extend the eBPF > >> steering > >> program so that it will be able to report to the userspace, but it > >> makes > >> little sense to allow to implement different hashing algorithms > >> with > >> eBPF since the hash value reported by virtio-net is strictly > >> defined by > >> the specification. > >> > >> The hash value already stored in sk_buff is not used and computed > >> independently since it may have been computed in a way not > >> conformant > >> with the specification. > >> > >> Signed-off-by: Akihiko Odaki > >> --- > > > >> +static const struct tun_vnet_hash_cap tun_vnet_hash_cap = { > >> + .max_indirection_table_length = > >> + TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH, > >> + > >> + .types = VIRTIO_NET_SUPPORTED_HASH_TYPES > >> +}; > > > > No need to have explicit capabilities exchange like this? Tun either > > supports all or none. > > tun does not support VIRTIO_NET_RSS_HASH_TYPE_IP_EX, > VIRTIO_NET_RSS_HASH_TYPE_TCP_EX, and VIRTIO_NET_RSS_HASH_TYPE_UDP_EX. > > It is because the flow dissector does not support IPv6 extensions. > The > specification is also vague, and does not tell how many TLVs should > be > consumed at most when interpreting destination option header so I > chose > to avoid adding code for these hash types to the flow dissector. I > doubt > anyone will complain about it since nobody complains for Linux. > > I'm also adding this so that we can extend it later. > max_indirection_table_length may grow for systems with 128+ CPUs, or > types may have other bits for new protocols in the future. > > > > >> case TUNSETSTEERINGEBPF: > >> - ret = tun_set_ebpf(tun, &tun->steering_prog, argp); > >> + bpf_ret = tun_set_ebpf(tun, &tun->steering_prog, > >> argp); > >> + if (IS_ERR(bpf_ret)) > >> + ret = PTR_ERR(bpf_ret); > >> + else if (bpf_ret) > >> + tun->vnet_hash.flags &= ~TUN_VNET_HASH_RSS; > > > > Don't make one feature disable another. > > > > TUNSETSTEERINGEBPF and TUNSETVNETHASH are mutually exclusive > > functions. If one is enabled the other call should fail, with EBUSY > > for instance. > > > >> + case TUNSETVNETHASH: > >> + len = sizeof(vnet_hash); > >> + if (copy_from_user(&vnet_hash, argp, len)) { > >> + ret = -EFAULT; > >> + break; > >> + } > >> + > >> + if (((vnet_hash.flags & TUN_VNET_HASH_REPORT) && > >> +(tun->vnet_hdr_sz < sizeof(struct > >> virtio_net_hdr_v1_hash) || > >> + !tun_is_little_endian(tun))) || > >> +vnet_hash.indirection_table_mask >= > >> +TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH) { > >> + ret = -EINVAL; > >> + break; > >> + } > >> + > >> + argp = (u8 __user *)argp + len; > >
Re: [PATCH 2/2] tools/virtio: Add hints when module is not installed
On Mon, Oct 09, 2023 at 02:44:55AM +, Liming Wu wrote: > > > > -Original Message- > > From: Jason Wang > > Sent: Sunday, October 8, 2023 12:36 PM > > To: Liming Wu > > Cc: Michael S . Tsirkin ; k...@vger.kernel.org; > > virtualization@lists.linux-foundation.org; net...@vger.kernel.org; linux- > > ker...@vger.kernel.org; 398776...@qq.com > > Subject: Re: [PATCH 2/2] tools/virtio: Add hints when module is not > > installed > > > > On Tue, Sep 26, 2023 at 1:00 PM wrote: > > > > > > From: Liming Wu > > > > > > Need to insmod vhost_test.ko before run virtio_test. > > > Give some hints to users. > > > > > > Signed-off-by: Liming Wu > > > --- > > > tools/virtio/virtio_test.c | 4 > > > 1 file changed, 4 insertions(+) > > > > > > diff --git a/tools/virtio/virtio_test.c b/tools/virtio/virtio_test.c > > > index 028f54e6854a..ce2c4d93d735 100644 > > > --- a/tools/virtio/virtio_test.c > > > +++ b/tools/virtio/virtio_test.c > > > @@ -135,6 +135,10 @@ static void vdev_info_init(struct vdev_info* dev, > > unsigned long long features) > > > dev->buf = malloc(dev->buf_size); > > > assert(dev->buf); > > > dev->control = open("/dev/vhost-test", O_RDWR); > > > + > > > + if (dev->control < 0) > > > + fprintf(stderr, "Install vhost_test module" \ > > > + "(./vhost_test/vhost_test.ko) firstly\n"); > > > > There should be many other reasons to fail for open(). > > > > Let's use strerror()? > Yes, Thanks for the review. > Please rechecked the code as follow: > --- a/tools/virtio/virtio_test.c > +++ b/tools/virtio/virtio_test.c > @@ -135,6 +135,11 @@ static void vdev_info_init(struct vdev_info* dev, > unsigned long long features) > dev->buf = malloc(dev->buf_size); > assert(dev->buf); > dev->control = open("/dev/vhost-test", O_RDWR); > + > + if (dev->control == NULL) ??? Why are you comparing a file descriptor to NULL? > + fprintf(stderr, > + "%s: Check whether vhost_test.ko is installed.\n", > + strerror(errno)); No, do not suggest checking unconditionally this is just wasting user's time. You would have to check the exact errno value. You will either get ENOENT or ENODEV if module is not loaded. Other errors indicate other problems. And what matters is whether it's loaded, not installed - vhost_test.ko will not get auto-loaded even if installed. > assert(dev->control >= 0); > r = ioctl(dev->control, VHOST_SET_OWNER, NULL); > assert(r >= 0); > > Thanks > In short, I am not applying this patch. If you really want to make things a bit easier in case of errors, replace all assert r >= 0 with a macro that prints out strerror(errno), that should be enough. Maybe print file/line number too while we are at it. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH vhost v3 00/16] vdpa: Add support for vq descriptor mappings
This patch series adds support for vq descriptor table mappings which are used to improve vdpa live migration downtime. The improvement comes from using smaller mappings which take less time to create and destroy in hw. The first part adds the vdpa core changes from Si-Wei [0]. The second part adds support in mlx5_vdpa: - Refactor the mr code to be able to cleanly add descriptor mappings. - Add hardware descriptor mr support. - Properly update iotlb for cvq during ASID switch. Changes in v3: - dup_iotlb now checks for src == dst case and returns an error. - Renamed iotlb parameter in dup_iotlb to dst. - Removed a redundant check of the asid value. - Fixed a commit message. - mx5_ifc.h patch has been applied to mlx5-vhost tree. When applying this series please pull from that tree first. Changes in v2: - The "vdpa/mlx5: Enable hw support for vq descriptor mapping" change was split off into two patches to avoid merge conflicts into the tree of Linus. The first patch contains only changes for mlx5_ifc.h. This must be applied into the mlx5-vdpa tree [1] first. Once this patch is applied on mlx5-vdpa, the change has to be pulled fom mlx5-vdpa into the vhost tree and only then the remaining patches can be applied. [0] https://lore.kernel.org/virtualization/1694248959-13369-1-git-send-email-si-wei@oracle.com [1] https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-vhost Dragos Tatulea (13): vdpa/mlx5: Expose descriptor group mkey hw capability vdpa/mlx5: Create helper function for dma mappings vdpa/mlx5: Decouple cvq iotlb handling from hw mapping code vdpa/mlx5: Take cvq iotlb lock during refresh vdpa/mlx5: Collapse "dvq" mr add/delete functions vdpa/mlx5: Rename mr destroy functions vdpa/mlx5: Allow creation/deletion of any given mr struct vdpa/mlx5: Move mr mutex out of mr struct vdpa/mlx5: Improve mr update flow vdpa/mlx5: Introduce mr for vq descriptor vdpa/mlx5: Enable hw support for vq descriptor mapping vdpa/mlx5: Make iotlb helper functions more generic vdpa/mlx5: Update cvq iotlb mapping on ASID change Si-Wei Liu (3): vdpa: introduce dedicated descriptor group for virtqueue vhost-vdpa: introduce descriptor group backend feature vhost-vdpa: uAPI to get dedicated descriptor group id drivers/vdpa/mlx5/core/mlx5_vdpa.h | 31 +++-- drivers/vdpa/mlx5/core/mr.c| 194 - drivers/vdpa/mlx5/core/resources.c | 6 +- drivers/vdpa/mlx5/net/mlx5_vnet.c | 98 ++- drivers/vhost/vdpa.c | 27 include/linux/mlx5/mlx5_ifc.h | 8 +- include/linux/mlx5/mlx5_ifc_vdpa.h | 7 +- include/linux/vdpa.h | 11 ++ include/uapi/linux/vhost.h | 8 ++ include/uapi/linux/vhost_types.h | 5 + 10 files changed, 266 insertions(+), 129 deletions(-) -- 2.41.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH mlx5-vhost v3 01/16] vdpa/mlx5: Expose descriptor group mkey hw capability
Necessary for improved live migration flow. Actual support will be added in a downstream patch. Reviewed-by: Gal Pressman Signed-off-by: Dragos Tatulea --- include/linux/mlx5/mlx5_ifc.h | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index dd8421d021cf..ec15330b970d 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -1231,7 +1231,13 @@ struct mlx5_ifc_virtio_emulation_cap_bits { u8 max_emulated_devices[0x8]; u8 max_num_virtio_queues[0x18]; - u8 reserved_at_a0[0x60]; + u8 reserved_at_a0[0x20]; + + u8 reserved_at_c0[0x13]; + u8 desc_group_mkey_supported[0x1]; + u8 reserved_at_d4[0xc]; + + u8 reserved_at_e0[0x20]; u8 umem_1_buffer_param_a[0x20]; -- 2.41.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH vhost v3 03/16] vhost-vdpa: introduce descriptor group backend feature
From: Si-Wei Liu Userspace knows if the device has dedicated descriptor group or not by checking this feature bit. It's only exposed if the vdpa driver backend implements the .get_vq_desc_group() operation callback. Userspace trying to negotiate this feature when it or the dependent _F_IOTLB_ASID feature hasn't been exposed will result in an error. Signed-off-by: Si-Wei Liu Acked-by: Eugenio Pérez Acked-by: Jason Wang --- drivers/vhost/vdpa.c | 17 + include/uapi/linux/vhost_types.h | 5 + 2 files changed, 22 insertions(+) diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index 78379ffd2336..2f21798a37ee 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -389,6 +389,14 @@ static bool vhost_vdpa_can_resume(const struct vhost_vdpa *v) return ops->resume; } +static bool vhost_vdpa_has_desc_group(const struct vhost_vdpa *v) +{ + struct vdpa_device *vdpa = v->vdpa; + const struct vdpa_config_ops *ops = vdpa->config; + + return ops->get_vq_desc_group; +} + static long vhost_vdpa_get_features(struct vhost_vdpa *v, u64 __user *featurep) { struct vdpa_device *vdpa = v->vdpa; @@ -690,6 +698,7 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep, if (copy_from_user(&features, featurep, sizeof(features))) return -EFAULT; if (features & ~(VHOST_VDPA_BACKEND_FEATURES | +BIT_ULL(VHOST_BACKEND_F_DESC_ASID) | BIT_ULL(VHOST_BACKEND_F_SUSPEND) | BIT_ULL(VHOST_BACKEND_F_RESUME) | BIT_ULL(VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK))) @@ -700,6 +709,12 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep, if ((features & BIT_ULL(VHOST_BACKEND_F_RESUME)) && !vhost_vdpa_can_resume(v)) return -EOPNOTSUPP; + if ((features & BIT_ULL(VHOST_BACKEND_F_DESC_ASID)) && + !(features & BIT_ULL(VHOST_BACKEND_F_IOTLB_ASID))) + return -EINVAL; + if ((features & BIT_ULL(VHOST_BACKEND_F_DESC_ASID)) && +!vhost_vdpa_has_desc_group(v)) + return -EOPNOTSUPP; vhost_set_backend_features(&v->vdev, features); return 0; } @@ -753,6 +768,8 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep, features |= BIT_ULL(VHOST_BACKEND_F_SUSPEND); if (vhost_vdpa_can_resume(v)) features |= BIT_ULL(VHOST_BACKEND_F_RESUME); + if (vhost_vdpa_has_desc_group(v)) + features |= BIT_ULL(VHOST_BACKEND_F_DESC_ASID); features |= vhost_vdpa_get_backend_features(v); if (copy_to_user(featurep, &features, sizeof(features))) r = -EFAULT; diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h index 2d827d22cd99..18ad6ae7ab5c 100644 --- a/include/uapi/linux/vhost_types.h +++ b/include/uapi/linux/vhost_types.h @@ -185,5 +185,10 @@ struct vhost_vdpa_iova_range { * DRIVER_OK */ #define VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK 0x6 +/* Device may expose the virtqueue's descriptor area, driver area and + * device area to a different group for ASID binding than where its + * buffers may reside. Requires VHOST_BACKEND_F_IOTLB_ASID. + */ +#define VHOST_BACKEND_F_DESC_ASID0x7 #endif -- 2.41.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH vhost v3 02/16] vdpa: introduce dedicated descriptor group for virtqueue
From: Si-Wei Liu In some cases, the access to the virtqueue's descriptor area, device and driver areas (precluding indirect descriptor table in guest memory) may have to be confined to a different address space than where its buffers reside. Without loss of simplicity and generality with already established terminology, let's fold up these 3 areas and call them as a whole as descriptor table group, or descriptor group for short. Specifically, in case of split virtqueues, descriptor group consists of regions for Descriptor Table, Available Ring and Used Ring; for packed virtqueues layout, descriptor group contains Descriptor Ring, Driver and Device Event Suppression structures. The group ID for a dedicated descriptor group can be obtained through a new .get_vq_desc_group() op. If driver implements this op, it means that the descriptor, device and driver areas of the virtqueue may reside in a dedicated group than where its buffers reside, a.k.a the default virtqueue group through the .get_vq_group() op. In principle, the descriptor group may or may not have same group ID as the default group. Even if the descriptor group has a different ID, meaning the vq's descriptor group areas can optionally move to a separate address space than where guest memory resides, the descriptor group may still start from a default address space, same as where its buffers reside. To move the descriptor group to a different address space, .set_group_asid() has to be called to change the ASID binding for the group, which is no different than what needs to be done on any other virtqueue group. On the other hand, the .reset() semantics also applies on descriptor table group, meaning the device reset will clear all ASID bindings and move all virtqueue groups including descriptor group back to the default address space, i.e. in ASID 0. QEMU's shadow virtqueue is going to utilize dedicated descriptor group to speed up map and unmap operations, yielding tremendous downtime reduction by avoiding the full and slow remap cycle in SVQ switching. Signed-off-by: Si-Wei Liu Acked-by: Eugenio Pérez Acked-by: Jason Wang --- include/linux/vdpa.h | 11 +++ 1 file changed, 11 insertions(+) diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h index 0e652026b776..d376309b99cf 100644 --- a/include/linux/vdpa.h +++ b/include/linux/vdpa.h @@ -204,6 +204,16 @@ struct vdpa_map_file { * @vdev: vdpa device * @idx: virtqueue index * Returns u32: group id for this virtqueue + * @get_vq_desc_group: Get the group id for the descriptor table of + * a specific virtqueue (optional) + * @vdev: vdpa device + * @idx: virtqueue index + * Returns u32: group id for the descriptor table + * portion of this virtqueue. Could be different + * than the one from @get_vq_group, in which case + * the access to the descriptor table can be + * confined to a separate asid, isolating from + * the virtqueue's buffer address access. * @get_device_features: Get virtio features supported by the device * @vdev: vdpa device * Returns the virtio features support by the @@ -360,6 +370,7 @@ struct vdpa_config_ops { /* Device ops */ u32 (*get_vq_align)(struct vdpa_device *vdev); u32 (*get_vq_group)(struct vdpa_device *vdev, u16 idx); + u32 (*get_vq_desc_group)(struct vdpa_device *vdev, u16 idx); u64 (*get_device_features)(struct vdpa_device *vdev); u64 (*get_backend_features)(const struct vdpa_device *vdev); int (*set_driver_features)(struct vdpa_device *vdev, u64 features); -- 2.41.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH vhost v3 04/16] vhost-vdpa: uAPI to get dedicated descriptor group id
From: Si-Wei Liu With _F_DESC_ASID backend feature, the device can now support the VHOST_VDPA_GET_VRING_DESC_GROUP ioctl, and it may expose the descriptor table (including avail and used ring) in a different group than the buffers it contains. This new uAPI will fetch the group ID of the descriptor table. Signed-off-by: Si-Wei Liu Acked-by: Eugenio Pérez Acked-by: Jason Wang --- drivers/vhost/vdpa.c | 10 ++ include/uapi/linux/vhost.h | 8 2 files changed, 18 insertions(+) diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index 2f21798a37ee..851535f57b95 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -613,6 +613,16 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd, else if (copy_to_user(argp, &s, sizeof(s))) return -EFAULT; return 0; + case VHOST_VDPA_GET_VRING_DESC_GROUP: + if (!vhost_vdpa_has_desc_group(v)) + return -EOPNOTSUPP; + s.index = idx; + s.num = ops->get_vq_desc_group(vdpa, idx); + if (s.num >= vdpa->ngroups) + return -EIO; + else if (copy_to_user(argp, &s, sizeof(s))) + return -EFAULT; + return 0; case VHOST_VDPA_SET_GROUP_ASID: if (copy_from_user(&s, argp, sizeof(s))) return -EFAULT; diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h index f5c48b61ab62..649560c685f1 100644 --- a/include/uapi/linux/vhost.h +++ b/include/uapi/linux/vhost.h @@ -219,4 +219,12 @@ */ #define VHOST_VDPA_RESUME _IO(VHOST_VIRTIO, 0x7E) +/* Get the group for the descriptor table including driver & device areas + * of a virtqueue: read index, write group in num. + * The virtqueue index is stored in the index field of vhost_vring_state. + * The group ID of the descriptor table for this specific virtqueue + * is returned via num field of vhost_vring_state. + */ +#define VHOST_VDPA_GET_VRING_DESC_GROUP_IOWR(VHOST_VIRTIO, 0x7F, \ + struct vhost_vring_state) #endif -- 2.41.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH vhost v3 06/16] vdpa/mlx5: Decouple cvq iotlb handling from hw mapping code
The handling of the cvq iotlb is currently coupled with the creation and destruction of the hardware mkeys (mr). This patch moves cvq iotlb handling into its own function and shifts it to a scope that is not related to mr handling. As cvq handling is just a prune_iotlb + dup_iotlb cycle, put it all in the same "update" function. Finally, the destruction path is handled by directly pruning the iotlb. After this move is done the ASID mr code can be collapsed into a single function. Acked-by: Jason Wang Acked-by: Eugenio Pérez Signed-off-by: Dragos Tatulea --- drivers/vdpa/mlx5/core/mlx5_vdpa.h | 3 ++ drivers/vdpa/mlx5/core/mr.c| 57 +++--- drivers/vdpa/mlx5/net/mlx5_vnet.c | 7 ++-- 3 files changed, 28 insertions(+), 39 deletions(-) diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h index 3748f027cfe9..554899a80241 100644 --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h @@ -120,6 +120,9 @@ int mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb, unsigned int asid); void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev); void mlx5_vdpa_destroy_mr_asid(struct mlx5_vdpa_dev *mvdev, unsigned int asid); +int mlx5_vdpa_update_cvq_iotlb(struct mlx5_vdpa_dev *mvdev, + struct vhost_iotlb *iotlb, + unsigned int asid); int mlx5_vdpa_create_dma_mr(struct mlx5_vdpa_dev *mvdev); #define mlx5_vdpa_warn(__dev, format, ...) \ diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c index 7bd0883b8b25..fcb6ae32e9ed 100644 --- a/drivers/vdpa/mlx5/core/mr.c +++ b/drivers/vdpa/mlx5/core/mr.c @@ -489,14 +489,6 @@ static void destroy_user_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *mr } } -static void _mlx5_vdpa_destroy_cvq_mr(struct mlx5_vdpa_dev *mvdev, unsigned int asid) -{ - if (mvdev->group2asid[MLX5_VDPA_CVQ_GROUP] != asid) - return; - - prune_iotlb(mvdev); -} - static void _mlx5_vdpa_destroy_dvq_mr(struct mlx5_vdpa_dev *mvdev, unsigned int asid) { struct mlx5_vdpa_mr *mr = &mvdev->mr; @@ -522,25 +514,14 @@ void mlx5_vdpa_destroy_mr_asid(struct mlx5_vdpa_dev *mvdev, unsigned int asid) mutex_lock(&mr->mkey_mtx); _mlx5_vdpa_destroy_dvq_mr(mvdev, asid); - _mlx5_vdpa_destroy_cvq_mr(mvdev, asid); mutex_unlock(&mr->mkey_mtx); } void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev) { - mlx5_vdpa_destroy_mr_asid(mvdev, mvdev->group2asid[MLX5_VDPA_CVQ_GROUP]); mlx5_vdpa_destroy_mr_asid(mvdev, mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP]); -} - -static int _mlx5_vdpa_create_cvq_mr(struct mlx5_vdpa_dev *mvdev, - struct vhost_iotlb *iotlb, - unsigned int asid) -{ - if (mvdev->group2asid[MLX5_VDPA_CVQ_GROUP] != asid) - return 0; - - return dup_iotlb(mvdev, iotlb); + prune_iotlb(mvdev); } static int _mlx5_vdpa_create_dvq_mr(struct mlx5_vdpa_dev *mvdev, @@ -572,22 +553,7 @@ static int _mlx5_vdpa_create_dvq_mr(struct mlx5_vdpa_dev *mvdev, static int _mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb, unsigned int asid) { - int err; - - err = _mlx5_vdpa_create_dvq_mr(mvdev, iotlb, asid); - if (err) - return err; - - err = _mlx5_vdpa_create_cvq_mr(mvdev, iotlb, asid); - if (err) - goto out_err; - - return 0; - -out_err: - _mlx5_vdpa_destroy_dvq_mr(mvdev, asid); - - return err; + return _mlx5_vdpa_create_dvq_mr(mvdev, iotlb, asid); } int mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb, @@ -620,7 +586,24 @@ int mlx5_vdpa_handle_set_map(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *io return err; } +int mlx5_vdpa_update_cvq_iotlb(struct mlx5_vdpa_dev *mvdev, + struct vhost_iotlb *iotlb, + unsigned int asid) +{ + if (mvdev->group2asid[MLX5_VDPA_CVQ_GROUP] != asid) + return 0; + + prune_iotlb(mvdev); + return dup_iotlb(mvdev, iotlb); +} + int mlx5_vdpa_create_dma_mr(struct mlx5_vdpa_dev *mvdev) { - return mlx5_vdpa_create_mr(mvdev, NULL, 0); + int err; + + err = mlx5_vdpa_create_mr(mvdev, NULL, 0); + if (err) + return err; + + return mlx5_vdpa_update_cvq_iotlb(mvdev, NULL, 0); } diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index 65b6a54ad344..aa4896662699 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -2884,10 +2884,13 @@ static int set_map_data(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb, return err; } -
[PATCH vhost v3 07/16] vdpa/mlx5: Take cvq iotlb lock during refresh
The reslock is taken while refresh is called but iommu_lock is more specific to this resource. So take the iommu_lock during cvq iotlb refresh. Based on Eugenio's patch [0]. [0] https://lore.kernel.org/lkml/20230112142218.725622-4-epere...@redhat.com/ Acked-by: Jason Wang Suggested-by: Eugenio Pérez Reviewed-by: Eugenio Pérez Signed-off-by: Dragos Tatulea --- drivers/vdpa/mlx5/core/mr.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c index fcb6ae32e9ed..587300e7c18e 100644 --- a/drivers/vdpa/mlx5/core/mr.c +++ b/drivers/vdpa/mlx5/core/mr.c @@ -590,11 +590,19 @@ int mlx5_vdpa_update_cvq_iotlb(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb, unsigned int asid) { + int err; + if (mvdev->group2asid[MLX5_VDPA_CVQ_GROUP] != asid) return 0; + spin_lock(&mvdev->cvq.iommu_lock); + prune_iotlb(mvdev); - return dup_iotlb(mvdev, iotlb); + err = dup_iotlb(mvdev, iotlb); + + spin_unlock(&mvdev->cvq.iommu_lock); + + return err; } int mlx5_vdpa_create_dma_mr(struct mlx5_vdpa_dev *mvdev) -- 2.41.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH vhost v3 05/16] vdpa/mlx5: Create helper function for dma mappings
Necessary for upcoming cvq separation from mr allocation. Acked-by: Jason Wang Signed-off-by: Dragos Tatulea --- drivers/vdpa/mlx5/core/mlx5_vdpa.h | 1 + drivers/vdpa/mlx5/core/mr.c| 5 + drivers/vdpa/mlx5/net/mlx5_vnet.c | 4 ++-- 3 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h index ca56242972b3..3748f027cfe9 100644 --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h @@ -120,6 +120,7 @@ int mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb, unsigned int asid); void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev); void mlx5_vdpa_destroy_mr_asid(struct mlx5_vdpa_dev *mvdev, unsigned int asid); +int mlx5_vdpa_create_dma_mr(struct mlx5_vdpa_dev *mvdev); #define mlx5_vdpa_warn(__dev, format, ...) \ dev_warn((__dev)->mdev->device, "%s:%d:(pid %d) warning: " format, __func__, __LINE__, \ diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c index 5a1971fcd87b..7bd0883b8b25 100644 --- a/drivers/vdpa/mlx5/core/mr.c +++ b/drivers/vdpa/mlx5/core/mr.c @@ -619,3 +619,8 @@ int mlx5_vdpa_handle_set_map(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *io return err; } + +int mlx5_vdpa_create_dma_mr(struct mlx5_vdpa_dev *mvdev) +{ + return mlx5_vdpa_create_mr(mvdev, NULL, 0); +} diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index 40a03b08d7cf..65b6a54ad344 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -2836,7 +2836,7 @@ static int mlx5_vdpa_reset(struct vdpa_device *vdev) ++mvdev->generation; if (MLX5_CAP_GEN(mvdev->mdev, umem_uid_0)) { - if (mlx5_vdpa_create_mr(mvdev, NULL, 0)) + if (mlx5_vdpa_create_dma_mr(mvdev)) mlx5_vdpa_warn(mvdev, "create MR failed\n"); } up_write(&ndev->reslock); @@ -3441,7 +3441,7 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name, goto err_mpfs; if (MLX5_CAP_GEN(mvdev->mdev, umem_uid_0)) { - err = mlx5_vdpa_create_mr(mvdev, NULL, 0); + err = mlx5_vdpa_create_dma_mr(mvdev); if (err) goto err_res; } -- 2.41.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH vhost v3 09/16] vdpa/mlx5: Rename mr destroy functions
Make mlx5_destroy_mr symmetric to mlx5_create_mr. Acked-by: Jason Wang Acked-by: Eugenio Pérez Signed-off-by: Dragos Tatulea --- drivers/vdpa/mlx5/core/mlx5_vdpa.h | 4 ++-- drivers/vdpa/mlx5/core/mr.c| 6 +++--- drivers/vdpa/mlx5/net/mlx5_vnet.c | 12 ++-- 3 files changed, 11 insertions(+), 11 deletions(-) diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h index 554899a80241..e1e6e7aba50e 100644 --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h @@ -118,8 +118,8 @@ int mlx5_vdpa_handle_set_map(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *io bool *change_map, unsigned int asid); int mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb, unsigned int asid); -void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev); -void mlx5_vdpa_destroy_mr_asid(struct mlx5_vdpa_dev *mvdev, unsigned int asid); +void mlx5_vdpa_destroy_mr_resources(struct mlx5_vdpa_dev *mvdev); +void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev, unsigned int asid); int mlx5_vdpa_update_cvq_iotlb(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb, unsigned int asid); diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c index fde00497f4ad..00dcce190a1f 100644 --- a/drivers/vdpa/mlx5/core/mr.c +++ b/drivers/vdpa/mlx5/core/mr.c @@ -507,7 +507,7 @@ static void _mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev, unsigned int asid mr->initialized = false; } -void mlx5_vdpa_destroy_mr_asid(struct mlx5_vdpa_dev *mvdev, unsigned int asid) +void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev, unsigned int asid) { struct mlx5_vdpa_mr *mr = &mvdev->mr; @@ -518,9 +518,9 @@ void mlx5_vdpa_destroy_mr_asid(struct mlx5_vdpa_dev *mvdev, unsigned int asid) mutex_unlock(&mr->mkey_mtx); } -void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev) +void mlx5_vdpa_destroy_mr_resources(struct mlx5_vdpa_dev *mvdev) { - mlx5_vdpa_destroy_mr_asid(mvdev, mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP]); + mlx5_vdpa_destroy_mr(mvdev, mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP]); prune_iotlb(mvdev); } diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index aa4896662699..ab196c43694c 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -2644,7 +2644,7 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev *mvdev, goto err_mr; teardown_driver(ndev); - mlx5_vdpa_destroy_mr_asid(mvdev, asid); + mlx5_vdpa_destroy_mr(mvdev, asid); err = mlx5_vdpa_create_mr(mvdev, iotlb, asid); if (err) goto err_mr; @@ -2660,7 +2660,7 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev *mvdev, return 0; err_setup: - mlx5_vdpa_destroy_mr_asid(mvdev, asid); + mlx5_vdpa_destroy_mr(mvdev, asid); err_mr: return err; } @@ -2797,7 +2797,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status) err_driver: unregister_link_notifier(ndev); err_setup: - mlx5_vdpa_destroy_mr(&ndev->mvdev); + mlx5_vdpa_destroy_mr_resources(&ndev->mvdev); ndev->mvdev.status |= VIRTIO_CONFIG_S_FAILED; err_clear: up_write(&ndev->reslock); @@ -2824,7 +2824,7 @@ static int mlx5_vdpa_reset(struct vdpa_device *vdev) unregister_link_notifier(ndev); teardown_driver(ndev); clear_vqs_ready(ndev); - mlx5_vdpa_destroy_mr(&ndev->mvdev); + mlx5_vdpa_destroy_mr_resources(&ndev->mvdev); ndev->mvdev.status = 0; ndev->mvdev.suspended = false; ndev->cur_num_vqs = 0; @@ -2944,7 +2944,7 @@ static void mlx5_vdpa_free(struct vdpa_device *vdev) ndev = to_mlx5_vdpa_ndev(mvdev); free_resources(ndev); - mlx5_vdpa_destroy_mr(mvdev); + mlx5_vdpa_destroy_mr_resources(mvdev); if (!is_zero_ether_addr(ndev->config.mac)) { pfmdev = pci_get_drvdata(pci_physfn(mvdev->mdev->pdev)); mlx5_mpfs_del_mac(pfmdev, ndev->config.mac); @@ -3474,7 +3474,7 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name, err_res2: free_resources(ndev); err_mr: - mlx5_vdpa_destroy_mr(mvdev); + mlx5_vdpa_destroy_mr_resources(mvdev); err_res: mlx5_vdpa_free_resources(&ndev->mvdev); err_mpfs: -- 2.41.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH vhost v3 08/16] vdpa/mlx5: Collapse "dvq" mr add/delete functions
Now that the cvq code is out of mlx5_vdpa_create/destroy_mr, the "dvq" functions can be folded into their callers. Having "dvq" in the naming will no longer be accurate in the downstream patches. Acked-by: Jason Wang Acked-by: Eugenio Pérez Signed-off-by: Dragos Tatulea --- drivers/vdpa/mlx5/core/mr.c | 16 +--- 1 file changed, 5 insertions(+), 11 deletions(-) diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c index 587300e7c18e..fde00497f4ad 100644 --- a/drivers/vdpa/mlx5/core/mr.c +++ b/drivers/vdpa/mlx5/core/mr.c @@ -489,7 +489,7 @@ static void destroy_user_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *mr } } -static void _mlx5_vdpa_destroy_dvq_mr(struct mlx5_vdpa_dev *mvdev, unsigned int asid) +static void _mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev, unsigned int asid) { struct mlx5_vdpa_mr *mr = &mvdev->mr; @@ -513,7 +513,7 @@ void mlx5_vdpa_destroy_mr_asid(struct mlx5_vdpa_dev *mvdev, unsigned int asid) mutex_lock(&mr->mkey_mtx); - _mlx5_vdpa_destroy_dvq_mr(mvdev, asid); + _mlx5_vdpa_destroy_mr(mvdev, asid); mutex_unlock(&mr->mkey_mtx); } @@ -524,9 +524,9 @@ void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev) prune_iotlb(mvdev); } -static int _mlx5_vdpa_create_dvq_mr(struct mlx5_vdpa_dev *mvdev, - struct vhost_iotlb *iotlb, - unsigned int asid) +static int _mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, + struct vhost_iotlb *iotlb, + unsigned int asid) { struct mlx5_vdpa_mr *mr = &mvdev->mr; int err; @@ -550,12 +550,6 @@ static int _mlx5_vdpa_create_dvq_mr(struct mlx5_vdpa_dev *mvdev, return 0; } -static int _mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, - struct vhost_iotlb *iotlb, unsigned int asid) -{ - return _mlx5_vdpa_create_dvq_mr(mvdev, iotlb, asid); -} - int mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb, unsigned int asid) { -- 2.41.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH vhost v3 11/16] vdpa/mlx5: Move mr mutex out of mr struct
The mutex is named like it is supposed to protect only the mkey but in reality it is a global lock for all mr resources. Shift the mutex to it's rightful location (struct mlx5_vdpa_dev) and give it a more appropriate name. Signed-off-by: Dragos Tatulea Acked-by: Eugenio Pérez --- drivers/vdpa/mlx5/core/mlx5_vdpa.h | 4 ++-- drivers/vdpa/mlx5/core/mr.c| 13 +++-- drivers/vdpa/mlx5/core/resources.c | 6 +++--- 3 files changed, 12 insertions(+), 11 deletions(-) diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h index 01d4ee58ccb1..9c6ac42c21e1 100644 --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h @@ -34,8 +34,6 @@ struct mlx5_vdpa_mr { /* state of dvq mr */ bool initialized; - /* serialize mkey creation and destruction */ - struct mutex mkey_mtx; bool user_mr; }; @@ -94,6 +92,8 @@ struct mlx5_vdpa_dev { u32 generation; struct mlx5_vdpa_mr mr; + /* serialize mr access */ + struct mutex mr_mtx; struct mlx5_control_vq cvq; struct workqueue_struct *wq; unsigned int group2asid[MLX5_VDPA_NUMVQ_GROUPS]; diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c index 6f29e8eaabb1..abd6a6fb122f 100644 --- a/drivers/vdpa/mlx5/core/mr.c +++ b/drivers/vdpa/mlx5/core/mr.c @@ -509,11 +509,11 @@ static void _mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_ void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *mr) { - mutex_lock(&mr->mkey_mtx); + mutex_lock(&mvdev->mr_mtx); _mlx5_vdpa_destroy_mr(mvdev, mr); - mutex_unlock(&mr->mkey_mtx); + mutex_unlock(&mvdev->mr_mtx); } void mlx5_vdpa_destroy_mr_resources(struct mlx5_vdpa_dev *mvdev) @@ -550,9 +550,10 @@ int mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, { int err; - mutex_lock(&mvdev->mr.mkey_mtx); + mutex_lock(&mvdev->mr_mtx); err = _mlx5_vdpa_create_mr(mvdev, mr, iotlb); - mutex_unlock(&mvdev->mr.mkey_mtx); + mutex_unlock(&mvdev->mr_mtx); + return err; } @@ -563,14 +564,14 @@ int mlx5_vdpa_handle_set_map(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *io int err = 0; *change_map = false; - mutex_lock(&mr->mkey_mtx); + mutex_lock(&mvdev->mr_mtx); if (mr->initialized) { mlx5_vdpa_info(mvdev, "memory map update\n"); *change_map = true; } if (!*change_map) err = _mlx5_vdpa_create_mr(mvdev, mr, iotlb); - mutex_unlock(&mr->mkey_mtx); + mutex_unlock(&mvdev->mr_mtx); return err; } diff --git a/drivers/vdpa/mlx5/core/resources.c b/drivers/vdpa/mlx5/core/resources.c index d5a59c9035fb..5c5a41b64bfc 100644 --- a/drivers/vdpa/mlx5/core/resources.c +++ b/drivers/vdpa/mlx5/core/resources.c @@ -256,7 +256,7 @@ int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev) mlx5_vdpa_warn(mvdev, "resources already allocated\n"); return -EINVAL; } - mutex_init(&mvdev->mr.mkey_mtx); + mutex_init(&mvdev->mr_mtx); res->uar = mlx5_get_uars_page(mdev); if (IS_ERR(res->uar)) { err = PTR_ERR(res->uar); @@ -301,7 +301,7 @@ int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev) err_uctx: mlx5_put_uars_page(mdev, res->uar); err_uars: - mutex_destroy(&mvdev->mr.mkey_mtx); + mutex_destroy(&mvdev->mr_mtx); return err; } @@ -318,6 +318,6 @@ void mlx5_vdpa_free_resources(struct mlx5_vdpa_dev *mvdev) dealloc_pd(mvdev, res->pdn, res->uid); destroy_uctx(mvdev, res->uid); mlx5_put_uars_page(mvdev->mdev, res->uar); - mutex_destroy(&mvdev->mr.mkey_mtx); + mutex_destroy(&mvdev->mr_mtx); res->valid = false; } -- 2.41.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH vhost v3 10/16] vdpa/mlx5: Allow creation/deletion of any given mr struct
This patch adapts the mr creation/deletion code to be able to work with any given mr struct pointer. All the APIs are adapted to take an extra parameter for the mr. mlx5_vdpa_create/delete_mr doesn't need a ASID parameter anymore. The check is done in the caller instead (mlx5_set_map). This change is needed for a followup patch which will introduce an additional mr for the vq descriptor data. Signed-off-by: Dragos Tatulea Acked-by: Eugenio Pérez --- drivers/vdpa/mlx5/core/mlx5_vdpa.h | 8 +++-- drivers/vdpa/mlx5/core/mr.c| 53 ++ drivers/vdpa/mlx5/net/mlx5_vnet.c | 10 -- 3 files changed, 36 insertions(+), 35 deletions(-) diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h index e1e6e7aba50e..01d4ee58ccb1 100644 --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h @@ -116,10 +116,12 @@ int mlx5_vdpa_create_mkey(struct mlx5_vdpa_dev *mvdev, u32 *mkey, u32 *in, int mlx5_vdpa_destroy_mkey(struct mlx5_vdpa_dev *mvdev, u32 mkey); int mlx5_vdpa_handle_set_map(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb, bool *change_map, unsigned int asid); -int mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb, - unsigned int asid); +int mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, + struct mlx5_vdpa_mr *mr, + struct vhost_iotlb *iotlb); void mlx5_vdpa_destroy_mr_resources(struct mlx5_vdpa_dev *mvdev); -void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev, unsigned int asid); +void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev, + struct mlx5_vdpa_mr *mr); int mlx5_vdpa_update_cvq_iotlb(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb, unsigned int asid); diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c index 00dcce190a1f..6f29e8eaabb1 100644 --- a/drivers/vdpa/mlx5/core/mr.c +++ b/drivers/vdpa/mlx5/core/mr.c @@ -301,10 +301,13 @@ static void unmap_direct_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_direct sg_free_table(&mr->sg_head); } -static int add_direct_chain(struct mlx5_vdpa_dev *mvdev, u64 start, u64 size, u8 perm, +static int add_direct_chain(struct mlx5_vdpa_dev *mvdev, + struct mlx5_vdpa_mr *mr, + u64 start, + u64 size, + u8 perm, struct vhost_iotlb *iotlb) { - struct mlx5_vdpa_mr *mr = &mvdev->mr; struct mlx5_vdpa_direct_mr *dmr; struct mlx5_vdpa_direct_mr *n; LIST_HEAD(tmp); @@ -354,9 +357,10 @@ static int add_direct_chain(struct mlx5_vdpa_dev *mvdev, u64 start, u64 size, u8 * indirect memory key that provides access to the enitre address space given * by iotlb. */ -static int create_user_mr(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb) +static int create_user_mr(struct mlx5_vdpa_dev *mvdev, + struct mlx5_vdpa_mr *mr, + struct vhost_iotlb *iotlb) { - struct mlx5_vdpa_mr *mr = &mvdev->mr; struct mlx5_vdpa_direct_mr *dmr; struct mlx5_vdpa_direct_mr *n; struct vhost_iotlb_map *map; @@ -384,7 +388,7 @@ static int create_user_mr(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb LOG_MAX_KLM_SIZE); mr->num_klms += nnuls; } - err = add_direct_chain(mvdev, ps, pe - ps, pperm, iotlb); + err = add_direct_chain(mvdev, mr, ps, pe - ps, pperm, iotlb); if (err) goto err_chain; } @@ -393,7 +397,7 @@ static int create_user_mr(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb pperm = map->perm; } } - err = add_direct_chain(mvdev, ps, pe - ps, pperm, iotlb); + err = add_direct_chain(mvdev, mr, ps, pe - ps, pperm, iotlb); if (err) goto err_chain; @@ -489,13 +493,8 @@ static void destroy_user_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *mr } } -static void _mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev, unsigned int asid) +static void _mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *mr) { - struct mlx5_vdpa_mr *mr = &mvdev->mr; - - if (mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP] != asid) - return; - if (!mr->initialized) return; @@ -507,38 +506,33 @@ static void _mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev, unsigned int asid mr->initialized = false; } -void mlx5_vdpa_destroy_mr(struct mlx5_vdp
[PATCH vhost v3 13/16] vdpa/mlx5: Introduce mr for vq descriptor
Introduce the vq descriptor group and mr per ASID. Until now .set_map on ASID 1 was only updating the cvq iotlb. From now on it also creates a mkey for it. The current patch doesn't use it but follow-up patches will add hardware support for mapping the vq descriptors. Acked-by: Eugenio Pérez Signed-off-by: Dragos Tatulea --- drivers/vdpa/mlx5/core/mlx5_vdpa.h | 5 +++-- drivers/vdpa/mlx5/core/mr.c| 14 +- drivers/vdpa/mlx5/net/mlx5_vnet.c | 20 +--- 3 files changed, 25 insertions(+), 14 deletions(-) diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h index bbe4335106bd..ae09296f4270 100644 --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h @@ -70,11 +70,12 @@ struct mlx5_vdpa_wq_ent { enum { MLX5_VDPA_DATAVQ_GROUP, MLX5_VDPA_CVQ_GROUP, + MLX5_VDPA_DATAVQ_DESC_GROUP, MLX5_VDPA_NUMVQ_GROUPS }; enum { - MLX5_VDPA_NUM_AS = MLX5_VDPA_NUMVQ_GROUPS + MLX5_VDPA_NUM_AS = 2 }; struct mlx5_vdpa_dev { @@ -89,7 +90,7 @@ struct mlx5_vdpa_dev { u16 max_idx; u32 generation; - struct mlx5_vdpa_mr *mr; + struct mlx5_vdpa_mr *mr[MLX5_VDPA_NUM_AS]; /* serialize mr access */ struct mutex mr_mtx; struct mlx5_control_vq cvq; diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c index 00eff5a07152..3dee6d9bed6b 100644 --- a/drivers/vdpa/mlx5/core/mr.c +++ b/drivers/vdpa/mlx5/core/mr.c @@ -511,8 +511,10 @@ void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev, _mlx5_vdpa_destroy_mr(mvdev, mr); - if (mvdev->mr == mr) - mvdev->mr = NULL; + for (int i = 0; i < MLX5_VDPA_NUM_AS; i++) { + if (mvdev->mr[i] == mr) + mvdev->mr[i] = NULL; + } mutex_unlock(&mvdev->mr_mtx); @@ -523,11 +525,11 @@ void mlx5_vdpa_update_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *new_mr, unsigned int asid) { - struct mlx5_vdpa_mr *old_mr = mvdev->mr; + struct mlx5_vdpa_mr *old_mr = mvdev->mr[asid]; mutex_lock(&mvdev->mr_mtx); - mvdev->mr = new_mr; + mvdev->mr[asid] = new_mr; if (old_mr) { _mlx5_vdpa_destroy_mr(mvdev, old_mr); kfree(old_mr); @@ -539,7 +541,9 @@ void mlx5_vdpa_update_mr(struct mlx5_vdpa_dev *mvdev, void mlx5_vdpa_destroy_mr_resources(struct mlx5_vdpa_dev *mvdev) { - mlx5_vdpa_destroy_mr(mvdev, mvdev->mr); + for (int i = 0; i < MLX5_VDPA_NUM_AS; i++) + mlx5_vdpa_destroy_mr(mvdev, mvdev->mr[i]); + prune_iotlb(mvdev); } diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index 4a87f9119fca..25bd2c324f5b 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -821,6 +821,8 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtque { int inlen = MLX5_ST_SZ_BYTES(create_virtio_net_q_in); u32 out[MLX5_ST_SZ_DW(create_virtio_net_q_out)] = {}; + struct mlx5_vdpa_dev *mvdev = &ndev->mvdev; + struct mlx5_vdpa_mr *vq_mr; void *obj_context; u16 mlx_features; void *cmd_hdr; @@ -873,7 +875,9 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtque MLX5_SET64(virtio_q, vq_ctx, desc_addr, mvq->desc_addr); MLX5_SET64(virtio_q, vq_ctx, used_addr, mvq->device_addr); MLX5_SET64(virtio_q, vq_ctx, available_addr, mvq->driver_addr); - MLX5_SET(virtio_q, vq_ctx, virtio_q_mkey, ndev->mvdev.mr->mkey); + vq_mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP]]; + if (vq_mr) + MLX5_SET(virtio_q, vq_ctx, virtio_q_mkey, vq_mr->mkey); MLX5_SET(virtio_q, vq_ctx, umem_1_id, mvq->umem1.id); MLX5_SET(virtio_q, vq_ctx, umem_1_size, mvq->umem1.size); MLX5_SET(virtio_q, vq_ctx, umem_2_id, mvq->umem2.id); @@ -2633,7 +2637,8 @@ static void restore_channels_info(struct mlx5_vdpa_net *ndev) } static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev *mvdev, - struct mlx5_vdpa_mr *new_mr, unsigned int asid) + struct mlx5_vdpa_mr *new_mr, + unsigned int asid) { struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev); int err; @@ -2652,8 +2657,10 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev *mvdev, restore_channels_info(ndev); err = setup_driver(mvdev); + if (err) + return err; - return err; + return 0; } /* reslock must be held for this function */ @@ -2869,8 +2876,8 @@ static int set_map_data(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb, struct mlx5_vdpa_mr *new_mr; int err; - if (mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP] != asid) -
[PATCH vhost v3 12/16] vdpa/mlx5: Improve mr update flow
The current flow for updating an mr works directly on mvdev->mr which makes it cumbersome to handle multiple new mr structs. This patch makes the flow more straightforward by having mlx5_vdpa_create_mr return a new mr which will update the old mr (if any). The old mr will be deleted and unlinked from mvdev. This change paves the way for adding mrs for different ASIDs. The initialized bool is no longer needed as mr is now a pointer in the mlx5_vdpa_dev struct which will be NULL when not initialized. Acked-by: Eugenio Pérez Signed-off-by: Dragos Tatulea --- drivers/vdpa/mlx5/core/mlx5_vdpa.h | 14 +++-- drivers/vdpa/mlx5/core/mr.c| 87 -- drivers/vdpa/mlx5/net/mlx5_vnet.c | 46 3 files changed, 76 insertions(+), 71 deletions(-) diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h index 9c6ac42c21e1..bbe4335106bd 100644 --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h @@ -31,8 +31,6 @@ struct mlx5_vdpa_mr { struct list_head head; unsigned long num_directs; unsigned long num_klms; - /* state of dvq mr */ - bool initialized; bool user_mr; }; @@ -91,7 +89,7 @@ struct mlx5_vdpa_dev { u16 max_idx; u32 generation; - struct mlx5_vdpa_mr mr; + struct mlx5_vdpa_mr *mr; /* serialize mr access */ struct mutex mr_mtx; struct mlx5_control_vq cvq; @@ -114,14 +112,14 @@ void mlx5_vdpa_free_resources(struct mlx5_vdpa_dev *mvdev); int mlx5_vdpa_create_mkey(struct mlx5_vdpa_dev *mvdev, u32 *mkey, u32 *in, int inlen); int mlx5_vdpa_destroy_mkey(struct mlx5_vdpa_dev *mvdev, u32 mkey); -int mlx5_vdpa_handle_set_map(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb, -bool *change_map, unsigned int asid); -int mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, - struct mlx5_vdpa_mr *mr, - struct vhost_iotlb *iotlb); +struct mlx5_vdpa_mr *mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, +struct vhost_iotlb *iotlb); void mlx5_vdpa_destroy_mr_resources(struct mlx5_vdpa_dev *mvdev); void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *mr); +void mlx5_vdpa_update_mr(struct mlx5_vdpa_dev *mvdev, +struct mlx5_vdpa_mr *mr, +unsigned int asid); int mlx5_vdpa_update_cvq_iotlb(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb, unsigned int asid); diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c index abd6a6fb122f..00eff5a07152 100644 --- a/drivers/vdpa/mlx5/core/mr.c +++ b/drivers/vdpa/mlx5/core/mr.c @@ -495,30 +495,51 @@ static void destroy_user_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *mr static void _mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *mr) { - if (!mr->initialized) - return; - if (mr->user_mr) destroy_user_mr(mvdev, mr); else destroy_dma_mr(mvdev, mr); - - mr->initialized = false; } void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *mr) { + if (!mr) + return; + mutex_lock(&mvdev->mr_mtx); _mlx5_vdpa_destroy_mr(mvdev, mr); + if (mvdev->mr == mr) + mvdev->mr = NULL; + + mutex_unlock(&mvdev->mr_mtx); + + kfree(mr); +} + +void mlx5_vdpa_update_mr(struct mlx5_vdpa_dev *mvdev, +struct mlx5_vdpa_mr *new_mr, +unsigned int asid) +{ + struct mlx5_vdpa_mr *old_mr = mvdev->mr; + + mutex_lock(&mvdev->mr_mtx); + + mvdev->mr = new_mr; + if (old_mr) { + _mlx5_vdpa_destroy_mr(mvdev, old_mr); + kfree(old_mr); + } + mutex_unlock(&mvdev->mr_mtx); + } void mlx5_vdpa_destroy_mr_resources(struct mlx5_vdpa_dev *mvdev) { - mlx5_vdpa_destroy_mr(mvdev, &mvdev->mr); + mlx5_vdpa_destroy_mr(mvdev, mvdev->mr); prune_iotlb(mvdev); } @@ -528,52 +549,36 @@ static int _mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, { int err; - if (mr->initialized) - return 0; - if (iotlb) err = create_user_mr(mvdev, mr, iotlb); else err = create_dma_mr(mvdev, mr); - if (err) - return err; - - mr->initialized = true; - - return 0; + return err; } -int mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, - struct mlx5_vdpa_mr *mr, - struct vhost_iotlb *iotlb) +struct mlx5_vdpa_mr *mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, +struct vhost_iotlb
[PATCH vhost v3 14/16] vdpa/mlx5: Enable hw support for vq descriptor mapping
Vq descriptor mappings are supported in hardware by filling in an additional mkey which contains the descriptor mappings to the hw vq. A previous patch in this series added support for hw mkey (mr) creation for ASID 1. This patch fills in both the vq data and vq descriptor mkeys based on group ASID mapping. The feature is signaled to the vdpa core through the presence of the .get_vq_desc_group op. Acked-by: Eugenio Pérez Signed-off-by: Dragos Tatulea --- drivers/vdpa/mlx5/net/mlx5_vnet.c | 24 +++- include/linux/mlx5/mlx5_ifc_vdpa.h | 7 ++- 2 files changed, 29 insertions(+), 2 deletions(-) diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index 25bd2c324f5b..2e0a3ce1c0cf 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -823,6 +823,7 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtque u32 out[MLX5_ST_SZ_DW(create_virtio_net_q_out)] = {}; struct mlx5_vdpa_dev *mvdev = &ndev->mvdev; struct mlx5_vdpa_mr *vq_mr; + struct mlx5_vdpa_mr *vq_desc_mr; void *obj_context; u16 mlx_features; void *cmd_hdr; @@ -878,6 +879,11 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtque vq_mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP]]; if (vq_mr) MLX5_SET(virtio_q, vq_ctx, virtio_q_mkey, vq_mr->mkey); + + vq_desc_mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_DESC_GROUP]]; + if (vq_desc_mr) + MLX5_SET(virtio_q, vq_ctx, desc_group_mkey, vq_desc_mr->mkey); + MLX5_SET(virtio_q, vq_ctx, umem_1_id, mvq->umem1.id); MLX5_SET(virtio_q, vq_ctx, umem_1_size, mvq->umem1.size); MLX5_SET(virtio_q, vq_ctx, umem_2_id, mvq->umem2.id); @@ -2265,6 +2271,16 @@ static u32 mlx5_vdpa_get_vq_group(struct vdpa_device *vdev, u16 idx) return MLX5_VDPA_DATAVQ_GROUP; } +static u32 mlx5_vdpa_get_vq_desc_group(struct vdpa_device *vdev, u16 idx) +{ + struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev); + + if (is_ctrl_vq_idx(mvdev, idx)) + return MLX5_VDPA_CVQ_GROUP; + + return MLX5_VDPA_DATAVQ_DESC_GROUP; +} + static u64 mlx_to_vritio_features(u16 dev_features) { u64 result = 0; @@ -3160,6 +3176,7 @@ static const struct vdpa_config_ops mlx5_vdpa_ops = { .get_vq_irq = mlx5_get_vq_irq, .get_vq_align = mlx5_vdpa_get_vq_align, .get_vq_group = mlx5_vdpa_get_vq_group, + .get_vq_desc_group = mlx5_vdpa_get_vq_desc_group, /* Op disabled if not supported. */ .get_device_features = mlx5_vdpa_get_device_features, .set_driver_features = mlx5_vdpa_set_driver_features, .get_driver_features = mlx5_vdpa_get_driver_features, @@ -3258,6 +3275,7 @@ struct mlx5_vdpa_mgmtdev { struct vdpa_mgmt_dev mgtdev; struct mlx5_adev *madev; struct mlx5_vdpa_net *ndev; + struct vdpa_config_ops vdpa_ops; }; static int config_func_mtu(struct mlx5_core_dev *mdev, u16 mtu) @@ -3371,7 +3389,7 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name, max_vqs = 2; } - ndev = vdpa_alloc_device(struct mlx5_vdpa_net, mvdev.vdev, mdev->device, &mlx5_vdpa_ops, + ndev = vdpa_alloc_device(struct mlx5_vdpa_net, mvdev.vdev, mdev->device, &mgtdev->vdpa_ops, MLX5_VDPA_NUMVQ_GROUPS, MLX5_VDPA_NUM_AS, name, false); if (IS_ERR(ndev)) return PTR_ERR(ndev); @@ -3546,6 +3564,10 @@ static int mlx5v_probe(struct auxiliary_device *adev, MLX5_CAP_DEV_VDPA_EMULATION(mdev, max_num_virtio_queues) + 1; mgtdev->mgtdev.supported_features = get_supported_features(mdev); mgtdev->madev = madev; + mgtdev->vdpa_ops = mlx5_vdpa_ops; + + if (!MLX5_CAP_DEV_VDPA_EMULATION(mdev, desc_group_mkey_supported)) + mgtdev->vdpa_ops.get_vq_desc_group = NULL; err = vdpa_mgmtdev_register(&mgtdev->mgtdev); if (err) diff --git a/include/linux/mlx5/mlx5_ifc_vdpa.h b/include/linux/mlx5/mlx5_ifc_vdpa.h index 9becdc3fa503..b86d51a855f6 100644 --- a/include/linux/mlx5/mlx5_ifc_vdpa.h +++ b/include/linux/mlx5/mlx5_ifc_vdpa.h @@ -74,7 +74,11 @@ struct mlx5_ifc_virtio_q_bits { u8reserved_at_320[0x8]; u8pd[0x18]; - u8reserved_at_340[0xc0]; + u8reserved_at_340[0x20]; + + u8desc_group_mkey[0x20]; + + u8reserved_at_380[0x80]; }; struct mlx5_ifc_virtio_net_q_object_bits { @@ -141,6 +145,7 @@ enum { MLX5_VIRTQ_MODIFY_MASK_STATE= (u64)1 << 0, MLX5_VIRTQ_MODIFY_MASK_DIRTY_BITMAP_PARAMS = (u64)1 << 3, MLX5_VIRTQ_MODIFY_MASK_DIRTY_BITMAP_DUMP_ENABLE = (u64)1 << 4, + MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY = (u64)1 << 14, }; enum { -- 2.41.0 __
[PATCH vhost v3 15/16] vdpa/mlx5: Make iotlb helper functions more generic
They will be used in a follow-up patch. For dup_iotlb, avoid the src == dst case. This is an error. Acked-by: Eugenio Pérez Signed-off-by: Dragos Tatulea --- drivers/vdpa/mlx5/core/mr.c | 19 +++ 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c index 3dee6d9bed6b..4a3df865df40 100644 --- a/drivers/vdpa/mlx5/core/mr.c +++ b/drivers/vdpa/mlx5/core/mr.c @@ -454,20 +454,23 @@ static void destroy_dma_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *mr) mlx5_vdpa_destroy_mkey(mvdev, mr->mkey); } -static int dup_iotlb(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *src) +static int dup_iotlb(struct vhost_iotlb *dst, struct vhost_iotlb *src) { struct vhost_iotlb_map *map; u64 start = 0, last = ULLONG_MAX; int err; + if (dst == src) + return -EINVAL; + if (!src) { - err = vhost_iotlb_add_range(mvdev->cvq.iotlb, start, last, start, VHOST_ACCESS_RW); + err = vhost_iotlb_add_range(dst, start, last, start, VHOST_ACCESS_RW); return err; } for (map = vhost_iotlb_itree_first(src, start, last); map; map = vhost_iotlb_itree_next(map, start, last)) { - err = vhost_iotlb_add_range(mvdev->cvq.iotlb, map->start, map->last, + err = vhost_iotlb_add_range(dst, map->start, map->last, map->addr, map->perm); if (err) return err; @@ -475,9 +478,9 @@ static int dup_iotlb(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *src) return 0; } -static void prune_iotlb(struct mlx5_vdpa_dev *mvdev) +static void prune_iotlb(struct vhost_iotlb *iotlb) { - vhost_iotlb_del_range(mvdev->cvq.iotlb, 0, ULLONG_MAX); + vhost_iotlb_del_range(iotlb, 0, ULLONG_MAX); } static void destroy_user_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *mr) @@ -544,7 +547,7 @@ void mlx5_vdpa_destroy_mr_resources(struct mlx5_vdpa_dev *mvdev) for (int i = 0; i < MLX5_VDPA_NUM_AS; i++) mlx5_vdpa_destroy_mr(mvdev, mvdev->mr[i]); - prune_iotlb(mvdev); + prune_iotlb(mvdev->cvq.iotlb); } static int _mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, @@ -596,8 +599,8 @@ int mlx5_vdpa_update_cvq_iotlb(struct mlx5_vdpa_dev *mvdev, spin_lock(&mvdev->cvq.iommu_lock); - prune_iotlb(mvdev); - err = dup_iotlb(mvdev, iotlb); + prune_iotlb(mvdev->cvq.iotlb); + err = dup_iotlb(mvdev->cvq.iotlb, iotlb); spin_unlock(&mvdev->cvq.iommu_lock); -- 2.41.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH vhost v3 16/16] vdpa/mlx5: Update cvq iotlb mapping on ASID change
For the following sequence: - cvq group is in ASID 0 - .set_map(1, cvq_iotlb) - .set_group_asid(cvq_group, 1) ... the cvq mapping from ASID 0 will be used. This is not always correct behaviour. This patch adds support for the above mentioned flow by saving the iotlb on each .set_map and updating the cvq iotlb with it on a cvq group change. Acked-by: Eugenio Pérez Signed-off-by: Dragos Tatulea --- drivers/vdpa/mlx5/core/mlx5_vdpa.h | 2 ++ drivers/vdpa/mlx5/core/mr.c| 26 ++ drivers/vdpa/mlx5/net/mlx5_vnet.c | 9 - 3 files changed, 36 insertions(+), 1 deletion(-) diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h index ae09296f4270..db988ced5a5d 100644 --- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h +++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h @@ -32,6 +32,8 @@ struct mlx5_vdpa_mr { unsigned long num_directs; unsigned long num_klms; + struct vhost_iotlb *iotlb; + bool user_mr; }; diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c index 4a3df865df40..66530e28f327 100644 --- a/drivers/vdpa/mlx5/core/mr.c +++ b/drivers/vdpa/mlx5/core/mr.c @@ -502,6 +502,8 @@ static void _mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_ destroy_user_mr(mvdev, mr); else destroy_dma_mr(mvdev, mr); + + vhost_iotlb_free(mr->iotlb); } void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev, @@ -561,6 +563,30 @@ static int _mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, else err = create_dma_mr(mvdev, mr); + if (err) + return err; + + mr->iotlb = vhost_iotlb_alloc(0, 0); + if (!mr->iotlb) { + err = -ENOMEM; + goto err_mr; + } + + err = dup_iotlb(mr->iotlb, iotlb); + if (err) + goto err_iotlb; + + return 0; + +err_iotlb: + vhost_iotlb_free(mr->iotlb); + +err_mr: + if (iotlb) + destroy_user_mr(mvdev, mr); + else + destroy_dma_mr(mvdev, mr); + return err; } diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index 2e0a3ce1c0cf..6abe02310f2b 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -3154,12 +3154,19 @@ static int mlx5_set_group_asid(struct vdpa_device *vdev, u32 group, unsigned int asid) { struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev); + int err = 0; if (group >= MLX5_VDPA_NUMVQ_GROUPS) return -EINVAL; mvdev->group2asid[group] = asid; - return 0; + + mutex_lock(&mvdev->mr_mtx); + if (group == MLX5_VDPA_CVQ_GROUP && mvdev->mr[asid]) + err = mlx5_vdpa_update_cvq_iotlb(mvdev, mvdev->mr[asid]->iotlb, asid); + mutex_unlock(&mvdev->mr_mtx); + + return err; } static const struct vdpa_config_ops mlx5_vdpa_ops = { -- 2.41.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature
Akihiko Odaki sorry about reposts. Having an email with two "@" in the CC list: xuanzhuo@linux.alibaba.comsh...@kernel.org tripped up mutt's reply-all for me and made it send to you only. I am guessing you meant two addresses: xuanz...@linux.alibaba.com and sh...@kernel.org. On Sun, Oct 08, 2023 at 02:20:49PM +0900, Akihiko Odaki wrote: > virtio-net have two usage of hashes: one is RSS and another is hash > reporting. Conventionally the hash calculation was done by the VMM. > However, computing the hash after the queue was chosen defeats the > purpose of RSS. > > Another approach is to use eBPF steering program. This approach has > another downside: it cannot report the calculated hash due to the > restrictive nature of eBPF. > > Introduce the code to compute hashes to the kernel in order to overcome > thse challenges. An alternative solution is to extend the eBPF steering > program so that it will be able to report to the userspace, but it makes > little sense to allow to implement different hashing algorithms with > eBPF since the hash value reported by virtio-net is strictly defined by > the specification. > > The hash value already stored in sk_buff is not used and computed > independently since it may have been computed in a way not conformant > with the specification. > > Signed-off-by: Akihiko Odaki > --- > drivers/net/tun.c | 187 > include/uapi/linux/if_tun.h | 16 +++ > 2 files changed, 182 insertions(+), 21 deletions(-) > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > index 89ab9efe522c..561a573cd008 100644 > --- a/drivers/net/tun.c > +++ b/drivers/net/tun.c > @@ -171,6 +171,9 @@ struct tun_prog { > struct bpf_prog *prog; > }; > > +#define TUN_VNET_HASH_MAX_KEY_SIZE 40 > +#define TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH 128 > + where do these come from? > /* Since the socket were moved to tun_file, to preserve the behavior of > persist > * device, socket filter, sndbuf and vnet header size were restore when the > * file were attached to a persist device. > @@ -209,6 +212,9 @@ struct tun_struct { > struct tun_prog __rcu *steering_prog; > struct tun_prog __rcu *filter_prog; > struct ethtool_link_ksettings link_ksettings; > + struct tun_vnet_hash vnet_hash; > + u16 > vnet_hash_indirection_table[TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH]; > + u32 vnet_hash_key[TUN_VNET_HASH_MAX_KEY_SIZE / 4]; That's quite a lot of data to add in this struct, and will be used by a small minority of users. Are you pushing any hot data out of cache with this? Why not allocate these as needed? > /* init args */ > struct file *file; > struct ifreq *ifr; > @@ -219,6 +225,13 @@ struct veth { > __be16 h_vlan_TCI; > }; > > +static const struct tun_vnet_hash_cap tun_vnet_hash_cap = { > + .max_indirection_table_length = > + TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH, > + > + .types = VIRTIO_NET_SUPPORTED_HASH_TYPES > +}; > + > static void tun_flow_init(struct tun_struct *tun); > static void tun_flow_uninit(struct tun_struct *tun); > > @@ -320,10 +333,16 @@ static long tun_set_vnet_be(struct tun_struct *tun, int > __user *argp) > if (get_user(be, argp)) > return -EFAULT; > > - if (be) > + if (be) { > + if (!(tun->flags & TUN_VNET_LE) && > + (tun->vnet_hash.flags & TUN_VNET_HASH_REPORT)) { > + return -EINVAL; > + } > + > tun->flags |= TUN_VNET_BE; > - else > + } else { > tun->flags &= ~TUN_VNET_BE; > + } > > return 0; > } > @@ -558,15 +577,47 @@ static u16 tun_ebpf_select_queue(struct tun_struct > *tun, struct sk_buff *skb) > return ret % numqueues; > } > > +static u16 tun_vnet_select_queue(struct tun_struct *tun, struct sk_buff *skb) > +{ > + u32 value = qdisc_skb_cb(skb)->tun_vnet_hash_value; > + u32 numqueues; > + u32 index; > + u16 queue; > + > + numqueues = READ_ONCE(tun->numqueues); > + if (!numqueues) > + return 0; > + > + index = value & READ_ONCE(tun->vnet_hash.indirection_table_mask); > + queue = READ_ONCE(tun->vnet_hash_indirection_table[index]); > + if (!queue) > + queue = READ_ONCE(tun->vnet_hash.unclassified_queue); Apparently 0 is an illegal queue value? You are making this part of UAPI better document things like this. > + > + return queue % numqueues; > +} > + > static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb, > struct net_device *sb_dev) > { > struct tun_struct *tun = netdev_priv(dev); > + u8 vnet_hash_flags = READ_ONCE(tun->vnet_hash.flags); > + struct virtio_net_hash hash; > u16 ret; > > + if (vnet_hash_flags & (TUN_VNET_HASH_RSS | TUN_VNET_HASH_REPORT)) { > + virtio_net_hash(skb, READ_ONCE(tun->vnet_hash.types), > +
Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature
On Mon, Oct 09, 2023 at 05:44:20PM +0900, Akihiko Odaki wrote: > On 2023/10/09 17:13, Willem de Bruijn wrote: > > On Sun, Oct 8, 2023 at 12:22 AM Akihiko Odaki > > wrote: > > > > > > virtio-net have two usage of hashes: one is RSS and another is hash > > > reporting. Conventionally the hash calculation was done by the VMM. > > > However, computing the hash after the queue was chosen defeats the > > > purpose of RSS. > > > > > > Another approach is to use eBPF steering program. This approach has > > > another downside: it cannot report the calculated hash due to the > > > restrictive nature of eBPF. > > > > > > Introduce the code to compute hashes to the kernel in order to overcome > > > thse challenges. An alternative solution is to extend the eBPF steering > > > program so that it will be able to report to the userspace, but it makes > > > little sense to allow to implement different hashing algorithms with > > > eBPF since the hash value reported by virtio-net is strictly defined by > > > the specification. > > > > > > The hash value already stored in sk_buff is not used and computed > > > independently since it may have been computed in a way not conformant > > > with the specification. > > > > > > Signed-off-by: Akihiko Odaki > > > > > @@ -2116,31 +2172,49 @@ static ssize_t tun_put_user(struct tun_struct > > > *tun, > > > } > > > > > > if (vnet_hdr_sz) { > > > - struct virtio_net_hdr gso; > > > + union { > > > + struct virtio_net_hdr hdr; > > > + struct virtio_net_hdr_v1_hash v1_hash_hdr; > > > + } hdr; > > > + int ret; > > > > > > if (iov_iter_count(iter) < vnet_hdr_sz) > > > return -EINVAL; > > > > > > - if (virtio_net_hdr_from_skb(skb, &gso, > > > - tun_is_little_endian(tun), > > > true, > > > - vlan_hlen)) { > > > + if ((READ_ONCE(tun->vnet_hash.flags) & > > > TUN_VNET_HASH_REPORT) && > > > + vnet_hdr_sz >= sizeof(hdr.v1_hash_hdr) && > > > + skb->tun_vnet_hash) { > > > > Isn't vnet_hdr_sz guaranteed to be >= hdr.v1_hash_hdr, by virtue of > > the set hash ioctl failing otherwise? > > > > Such checks should be limited to control path where possible > > There is a potential race since tun->vnet_hash.flags and vnet_hdr_sz are not > read at once. And then it's a complete mess and you get inconsistent behaviour with packets getting sent all over the place, right? So maybe keep a pointer to this struct so it can be changed atomically then. Maybe even something with rcu I donnu. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next v3 12/12] test/vsock: io_uring rx/tx tests
On Sat, Oct 07, 2023 at 08:21:39PM +0300, Arseniy Krasnov wrote: This adds set of tests which use io_uring for rx/tx. This test suite is implemented as separated util like 'vsock_test' and has the same set of input arguments as 'vsock_test'. These tests only cover cases of data transmission (no connect/bind/accept etc). Signed-off-by: Arseniy Krasnov --- Changelog: v1 -> v2: * Add 'LDLIBS = -luring' to the target 'vsock_uring_test'. * Add 'vsock_uring_test' to the target 'test'. v2 -> v3: * Make 'struct vsock_test_data' private by placing it to the .c file. Rename it and add comments to this struct to clarify sense of its fields. * Add 'vsock_uring_test' to the '.gitignore'. * Add receive loop to the server side - this is needed to read entire data sent by client. tools/testing/vsock/.gitignore | 1 + tools/testing/vsock/Makefile | 7 +- tools/testing/vsock/vsock_uring_test.c | 350 + 3 files changed, 356 insertions(+), 2 deletions(-) create mode 100644 tools/testing/vsock/vsock_uring_test.c diff --git a/tools/testing/vsock/.gitignore b/tools/testing/vsock/.gitignore index a8adcfdc292b..d9f798713cd7 100644 --- a/tools/testing/vsock/.gitignore +++ b/tools/testing/vsock/.gitignore @@ -3,3 +3,4 @@ vsock_test vsock_diag_test vsock_perf +vsock_uring_test diff --git a/tools/testing/vsock/Makefile b/tools/testing/vsock/Makefile index 1a26f60a596c..b80e7c7def1e 100644 --- a/tools/testing/vsock/Makefile +++ b/tools/testing/vsock/Makefile @@ -1,12 +1,15 @@ # SPDX-License-Identifier: GPL-2.0-only all: test vsock_perf -test: vsock_test vsock_diag_test +test: vsock_test vsock_diag_test vsock_uring_test vsock_test: vsock_test.o vsock_test_zerocopy.o timeout.o control.o util.o vsock_diag_test: vsock_diag_test.o timeout.o control.o util.o vsock_perf: vsock_perf.o +vsock_uring_test: LDLIBS = -luring +vsock_uring_test: control.o util.o vsock_uring_test.o timeout.o + CFLAGS += -g -O2 -Werror -Wall -I. -I../../include -I../../../usr/include -Wno-pointer-sign -fno-strict-overflow -fno-strict-aliasing -fno-common -MMD -U_FORTIFY_SOURCE -D_GNU_SOURCE .PHONY: all test clean clean: - ${RM} *.o *.d vsock_test vsock_diag_test vsock_perf + ${RM} *.o *.d vsock_test vsock_diag_test vsock_perf vsock_uring_test -include *.d diff --git a/tools/testing/vsock/vsock_uring_test.c b/tools/testing/vsock/vsock_uring_test.c new file mode 100644 index ..889887cf3989 --- /dev/null +++ b/tools/testing/vsock/vsock_uring_test.c @@ -0,0 +1,350 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* io_uring tests for vsock + * + * Copyright (C) 2023 SberDevices. + * + * Author: Arseniy Krasnov + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "util.h" +#include "control.h" +#include "msg_zerocopy_common.h" + +#define PAGE_SIZE 4096 Ditto. +#define RING_ENTRIES_NUM 4 + +#define VSOCK_TEST_DATA_MAX_IOV 3 + +struct vsock_io_uring_test { + /* Number of valid elements in 'vecs'. */ + int vecs_cnt; + /* Array how to allocate buffers for test. +* 'iov_base' == NULL -> valid buf: mmap('iov_len'). +* +* 'iov_base' == MAP_FAILED -> invalid buf: +* mmap('iov_len'), then munmap('iov_len'). +* 'iov_base' still contains result of +* mmap(). +* +* 'iov_base' == number -> unaligned valid buf: +* mmap('iov_len') + number. +*/ + struct iovec vecs[VSOCK_TEST_DATA_MAX_IOV]; +}; + +static struct vsock_io_uring_test test_data_array[] = { + /* All elements have page aligned base and size. */ + { + .vecs_cnt = 3, + { + { NULL, PAGE_SIZE }, + { NULL, 2 * PAGE_SIZE }, + { NULL, 3 * PAGE_SIZE }, + } + }, + /* Middle element has both non-page aligned base and size. */ + { + .vecs_cnt = 3, + { + { NULL, PAGE_SIZE }, + { (void *)1, 200 }, + { NULL, 3 * PAGE_SIZE }, + } + } +}; + +static void vsock_io_uring_client(const struct test_opts *opts, + const struct vsock_io_uring_test *test_data, + bool msg_zerocopy) +{ + struct io_uring_sqe *sqe; + struct io_uring_cqe *cqe; + struct io_uring ring; + struct iovec *iovec; + struct msghdr msg; + int fd; + + fd = vsock_stream_connect(opts->peer_cid, 1234); + if (fd < 0) { + perror("connect"); + exit(EXIT_FAILURE); + } + + if (msg_zerocopy) + enable_so_zerocopy(fd); + + iovec = iovec_from_test_data(test_data->vecs, test_data->vecs_cnt); Ah, I see this is used also here, so now I get why in util.
Re: [PATCH net-next v3 02/12] vsock: read from socket's error queue
On Sat, Oct 07, 2023 at 08:21:29PM +0300, Arseniy Krasnov wrote: This adds handling of MSG_ERRQUEUE input flag in receive call. This flag is used to read socket's error queue instead of data queue. Possible scenario of error queue usage is receiving completions for transmission with MSG_ZEROCOPY flag. This patch also adds new defines: 'SOL_VSOCK' and 'VSOCK_RECVERR'. Signed-off-by: Arseniy Krasnov --- Changelog: v1 -> v2: * Place new defines for userspace to the existing file 'vm_sockets.h' instead of creating new one. v2 -> v3: * Add comments to describe 'SOL_VSOCK' and 'VSOCK_RECVERR' in the file 'vm_sockets.h'. * Reorder includes in 'af_vsock.c' in alphabetical order. include/linux/socket.h | 1 + include/uapi/linux/vm_sockets.h | 12 net/vmw_vsock/af_vsock.c| 6 ++ 3 files changed, 19 insertions(+) diff --git a/include/linux/socket.h b/include/linux/socket.h index 39b74d83c7c4..cfcb7e2c3813 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -383,6 +383,7 @@ struct ucred { #define SOL_MPTCP 284 #define SOL_MCTP285 #define SOL_SMC 286 +#define SOL_VSOCK 287 /* IPX options */ #define IPX_TYPE1 diff --git a/include/uapi/linux/vm_sockets.h b/include/uapi/linux/vm_sockets.h index c60ca33eac59..d9d703b2d45a 100644 --- a/include/uapi/linux/vm_sockets.h +++ b/include/uapi/linux/vm_sockets.h @@ -191,4 +191,16 @@ struct sockaddr_vm { #define IOCTL_VM_SOCKETS_GET_LOCAL_CID _IO(7, 0xb9) +/* For reading completion in case of MSG_ZEROCOPY flag transmission. + * This is value of 'cmsg_level' field of the 'struct cmsghdr'. + */ + +#define SOL_VSOCK 287 + +/* For reading completion in case of MSG_ZEROCOPY flag transmission. + * This is value of 'cmsg_type' field of the 'struct cmsghdr'. + */ + +#define VSOCK_RECVERR 1 I would suggest a bit more context here, something like this: diff --git a/include/uapi/linux/vm_sockets.h b/include/uapi/linux/vm_sockets.h index d9d703b2d45a..ed07181d4eff 100644 --- a/include/uapi/linux/vm_sockets.h +++ b/include/uapi/linux/vm_sockets.h @@ -191,14 +191,19 @@ struct sockaddr_vm { #define IOCTL_VM_SOCKETS_GET_LOCAL_CID _IO(7, 0xb9) -/* For reading completion in case of MSG_ZEROCOPY flag transmission. - * This is value of 'cmsg_level' field of the 'struct cmsghdr'. +/* MSG_ZEROCOPY notifications are encoded in the standard error format, + * sock_extended_err. See Documentation/networking/msg_zerocopy.rst in + * kernel source tree for more details. + */ + +/* 'cmsg_level' field value of 'struct cmsghdr' for notification parsing + * when MSG_ZEROCOPY flag is used on transmissions. */ #define SOL_VSOCK 287 -/* For reading completion in case of MSG_ZEROCOPY flag transmission. - * This is value of 'cmsg_type' field of the 'struct cmsghdr'. +/* 'cmsg_type' field value of 'struct cmsghdr' for notification parsing + * when MSG_ZEROCOPY flag is used on transmissions. */ #define VSOCK_RECVERR 1 The rest LGTM. Stefano + #endif /* _UAPI_VM_SOCKETS_H */ diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index d841f4de33b0..38486efd3d05 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -89,6 +89,7 @@ #include #include #include +#include #include #include #include @@ -110,6 +111,7 @@ #include #include #include +#include static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr); static void vsock_sk_destruct(struct sock *sk); @@ -2137,6 +2139,10 @@ vsock_connectible_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, int err; sk = sock->sk; + + if (unlikely(flags & MSG_ERRQUEUE)) + return sock_recv_errqueue(sk, msg, len, SOL_VSOCK, VSOCK_RECVERR); + vsk = vsock_sk(sk); err = 0; -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next v3 10/12] test/vsock: MSG_ZEROCOPY flag tests
On Sat, Oct 07, 2023 at 08:21:37PM +0300, Arseniy Krasnov wrote: This adds three tests for MSG_ZEROCOPY feature: 1) SOCK_STREAM tx with different buffers. 2) SOCK_SEQPACKET tx with different buffers. 3) SOCK_STREAM test to read empty error queue of the socket. Patch also works as preparation for the next patches for tools in this patchset: vsock_perf and vsock_uring_test: 1) Adds several new functions to util.c - they will be also used by vsock_uring_test. 2) Adds two new functions for MSG_ZEROCOPY handling to a new header file - such header will be shared between vsock_test, vsock_perf and vsock_uring_test, thus avoiding code copy-pasting. Signed-off-by: Arseniy Krasnov --- Changelog: v1 -> v2: * Move 'SOL_VSOCK' and 'VSOCK_RECVERR' from 'util.c' to 'util.h'. v2 -> v3: * Patch was reworked. Now it is also preparation patch (see commit message). Shared stuff for 'vsock_perf' and tests is placed to a new header file, while shared code between current test tool and future uring test is placed to the 'util.c'. I think, that making this patch as preparation allows to reduce number of changes in the next patches in this patchset. * Make 'struct vsock_test_data' private by placing it to the .c file. Also add comments to this struct to clarify sense of its fields. tools/testing/vsock/Makefile | 2 +- tools/testing/vsock/msg_zerocopy_common.h | 92 ++ tools/testing/vsock/util.c| 110 +++ tools/testing/vsock/util.h| 5 + tools/testing/vsock/vsock_test.c | 16 + tools/testing/vsock/vsock_test_zerocopy.c | 367 ++ tools/testing/vsock/vsock_test_zerocopy.h | 15 + 7 files changed, 606 insertions(+), 1 deletion(-) create mode 100644 tools/testing/vsock/msg_zerocopy_common.h create mode 100644 tools/testing/vsock/vsock_test_zerocopy.c create mode 100644 tools/testing/vsock/vsock_test_zerocopy.h diff --git a/tools/testing/vsock/Makefile b/tools/testing/vsock/Makefile index 21a98ba565ab..1a26f60a596c 100644 --- a/tools/testing/vsock/Makefile +++ b/tools/testing/vsock/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0-only all: test vsock_perf test: vsock_test vsock_diag_test -vsock_test: vsock_test.o timeout.o control.o util.o +vsock_test: vsock_test.o vsock_test_zerocopy.o timeout.o control.o util.o vsock_diag_test: vsock_diag_test.o timeout.o control.o util.o vsock_perf: vsock_perf.o diff --git a/tools/testing/vsock/msg_zerocopy_common.h b/tools/testing/vsock/msg_zerocopy_common.h new file mode 100644 index ..ce89f1281584 --- /dev/null +++ b/tools/testing/vsock/msg_zerocopy_common.h @@ -0,0 +1,92 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef MSG_ZEROCOPY_COMMON_H +#define MSG_ZEROCOPY_COMMON_H + +#include +#include +#include +#include +#include + +#ifndef SOL_VSOCK +#define SOL_VSOCK 287 +#endif + +#ifndef VSOCK_RECVERR +#define VSOCK_RECVERR 1 +#endif + +static void enable_so_zerocopy(int fd) +{ + int val = 1; + + if (setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &val, sizeof(val))) { + perror("setsockopt"); + exit(EXIT_FAILURE); + } +} + +static void vsock_recv_completion(int fd, const bool *zerocopied) __maybe_unused; To avoid this, maybe we can implement those functions in .c file and link the object. WDYT? Ah, here (cc (GCC) 13.2.1 20230728 (Red Hat 13.2.1-1)) the build is failing: In file included from vsock_perf.c:23: msg_zerocopy_common.h: In function ‘vsock_recv_completion’: msg_zerocopy_common.h:29:67: error: expected declaration specifiers before ‘__maybe_unused’ 29 | static void vsock_recv_completion(int fd, const bool *zerocopied) __maybe_unused; | ^~ msg_zerocopy_common.h:31:1: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘{’ token 31 | { | ^ +static void vsock_recv_completion(int fd, const bool *zerocopied) +{ + struct sock_extended_err *serr; + struct msghdr msg = { 0 }; + char cmsg_data[128]; + struct cmsghdr *cm; + ssize_t res; + + msg.msg_control = cmsg_data; + msg.msg_controllen = sizeof(cmsg_data); + + res = recvmsg(fd, &msg, MSG_ERRQUEUE); + if (res) { + fprintf(stderr, "failed to read error queue: %zi\n", res); + exit(EXIT_FAILURE); + } + + cm = CMSG_FIRSTHDR(&msg); + if (!cm) { + fprintf(stderr, "cmsg: no cmsg\n"); + exit(EXIT_FAILURE); + } + + if (cm->cmsg_level != SOL_VSOCK) { + fprintf(stderr, "cmsg: unexpected 'cmsg_level'\n"); + exit(EXIT_FAILURE); + } + + if (cm->cmsg_type != VSOCK_RECVERR) { + fprintf(stderr, "cmsg: unexpected 'cmsg_type'\n"); + exit(EXIT_FAILURE); + } + + serr = (void *)CMSG_DATA(cm); + if (serr->ee_origin != SO_EE_ORIGIN_ZEROCOP
Re: [PATCH net-next v3 11/12] test/vsock: MSG_ZEROCOPY support for vsock_perf
On Sat, Oct 07, 2023 at 08:21:38PM +0300, Arseniy Krasnov wrote: To use this option pass '--zerocopy' parameter: ./vsock_perf --zerocopy --sender ... With this option MSG_ZEROCOPY flag will be passed to the 'send()' call. Signed-off-by: Arseniy Krasnov --- Changelog: v1 -> v2: * Move 'SOL_VSOCK' and 'VSOCK_RECVERR' from 'util.c' to 'util.h'. v2 -> v3: * Use 'msg_zerocopy_common.h' for MSG_ZEROCOPY related things. * Rename '--zc' option to '--zerocopy'. * Add detail in help that zerocopy mode is for sender mode only. tools/testing/vsock/vsock_perf.c | 80 1 file changed, 71 insertions(+), 9 deletions(-) Reviewed-by: Stefano Garzarella ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH vhost v3 2/4] virtio_pci: fix the common cfg map size
The function vp_modern_map_capability() takes the size parameter, which corresponds to the size of virtio_pci_common_cfg. As a result, this indicates the size of memory area to map. Now the size is the size of virtio_pci_common_cfg, but some feature(such as the _F_RING_RESET) needs the virtio_pci_modern_common_cfg, so this commit changes the size to the size of virtio_pci_modern_common_cfg. Fixes: 0b50cece0b78 ("virtio_pci: introduce helper to get/set queue reset") Signed-off-by: Xuan Zhuo --- drivers/virtio/virtio_pci_modern_dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c index aad7d9296e77..9cb601e16688 100644 --- a/drivers/virtio/virtio_pci_modern_dev.c +++ b/drivers/virtio/virtio_pci_modern_dev.c @@ -291,7 +291,7 @@ int vp_modern_probe(struct virtio_pci_modern_device *mdev) err = -EINVAL; mdev->common = vp_modern_map_capability(mdev, common, sizeof(struct virtio_pci_common_cfg), 4, - 0, sizeof(struct virtio_pci_common_cfg), + 0, sizeof(struct virtio_pci_modern_common_cfg), NULL, NULL); if (!mdev->common) goto err_map_common; -- 2.32.0.3.g01195cf9f ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH vhost v3 1/4] virtio: add definition of VIRTIO_F_NOTIF_CONFIG_DATA feature bit
This patch adds the definition of VIRTIO_F_NOTIF_CONFIG_DATA feature bit in the relevant header file. This feature indicates that the driver uses the data provided by the device as a virtqueue identifier in available buffer notifications. It comes from here: https://github.com/oasis-tcs/virtio-spec/issues/89 Signed-off-by: Xuan Zhuo --- include/uapi/linux/virtio_config.h | 5 + 1 file changed, 5 insertions(+) diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h index 2c712c654165..8881aea60f6f 100644 --- a/include/uapi/linux/virtio_config.h +++ b/include/uapi/linux/virtio_config.h @@ -105,6 +105,11 @@ */ #define VIRTIO_F_NOTIFICATION_DATA 38 +/* This feature indicates that the driver uses the data provided by the device + * as a virtqueue identifier in available buffer notifications. + */ +#define VIRTIO_F_NOTIF_CONFIG_DATA 39 + /* * This feature indicates that the driver can reset a queue individually. */ -- 2.32.0.3.g01195cf9f ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH vhost v3 4/4] virtio_pci: add build offset check for the new common cfg items
Add checks to the check_offsets(void) for queue_notify_data and queue_reset. Signed-off-by: Xuan Zhuo Acked-by: Jason Wang --- drivers/virtio/virtio_pci_modern_dev.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c index 33f319da1558..e2a1fe7bb66c 100644 --- a/drivers/virtio/virtio_pci_modern_dev.c +++ b/drivers/virtio/virtio_pci_modern_dev.c @@ -203,6 +203,10 @@ static inline void check_offsets(void) offsetof(struct virtio_pci_common_cfg, queue_used_lo)); BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_USEDHI != offsetof(struct virtio_pci_common_cfg, queue_used_hi)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_NDATA != +offsetof(struct virtio_pci_modern_common_cfg, queue_notify_data)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_RESET != +offsetof(struct virtio_pci_modern_common_cfg, queue_reset)); } /* -- 2.32.0.3.g01195cf9f ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH vhost v3 0/4] strictly check the acccess to the common cfg
1. fix the length of the pci_iomap_range() to the modern common cfg 2. add common cfg size check for the VIRTIO_F_NOTIF_CONFIG_DATA, VIRTIO_F_RING_RESET 3. add build size check to the new common cfg items Xuan Zhuo (4): virtio: add definition of VIRTIO_F_NOTIF_CONFIG_DATA feature bit virtio_pci: fix the common cfg map size virtio_pci: add check for common cfg size virtio_pci: add build offset check for the new common cfg items drivers/virtio/virtio_pci_modern.c | 36 ++ drivers/virtio/virtio_pci_modern_dev.c | 8 -- include/linux/virtio_pci_modern.h | 1 + include/uapi/linux/virtio_config.h | 5 4 files changed, 48 insertions(+), 2 deletions(-) -- 2.32.0.3.g01195cf9f ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH vhost v3 3/4] virtio_pci: add check for common cfg size
Some buggy devices, the common cfg size may not match the features. This patch checks the common cfg size for the features(VIRTIO_F_NOTIF_CONFIG_DATA, VIRTIO_F_RING_RESET). When the common cfg size does not match the corresponding feature, we fail the probe and print error message. Signed-off-by: Xuan Zhuo --- drivers/virtio/virtio_pci_modern.c | 36 ++ drivers/virtio/virtio_pci_modern_dev.c | 2 +- include/linux/virtio_pci_modern.h | 1 + 3 files changed, 38 insertions(+), 1 deletion(-) diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c index d6bb68ba84e5..6a8f5ff05636 100644 --- a/drivers/virtio/virtio_pci_modern.c +++ b/drivers/virtio/virtio_pci_modern.c @@ -39,6 +39,39 @@ static void vp_transport_features(struct virtio_device *vdev, u64 features) __virtio_set_bit(vdev, VIRTIO_F_RING_RESET); } +static int __vp_check_common_size_one_feature(struct virtio_device *vdev, u32 fbit, + u32 offset, const char *fname) +{ + struct virtio_pci_device *vp_dev = to_vp_device(vdev); + + if (!__virtio_test_bit(vdev, fbit)) + return 0; + + if (likely(vp_dev->mdev.common_len >= offset)) + return 0; + + dev_err(&vdev->dev, + "virtio: common cfg size(%ld) does not match the feature %s\n", + vp_dev->mdev.common_len, fname); + + return -EINVAL; +} + +#define vp_check_common_size_one_feature(vdev, fbit, field) \ + __vp_check_common_size_one_feature(vdev, fbit, \ + offsetofend(struct virtio_pci_modern_common_cfg, field), #fbit) + +static int vp_check_common_size(struct virtio_device *vdev) +{ + if (vp_check_common_size_one_feature(vdev, VIRTIO_F_NOTIF_CONFIG_DATA, queue_notify_data)) + return -EINVAL; + + if (vp_check_common_size_one_feature(vdev, VIRTIO_F_RING_RESET, queue_reset)) + return -EINVAL; + + return 0; +} + /* virtio config->finalize_features() implementation */ static int vp_finalize_features(struct virtio_device *vdev) { @@ -57,6 +90,9 @@ static int vp_finalize_features(struct virtio_device *vdev) return -EINVAL; } + if (vp_check_common_size(vdev)) + return -EINVAL; + vp_modern_set_features(&vp_dev->mdev, vdev->features); return 0; diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c index 9cb601e16688..33f319da1558 100644 --- a/drivers/virtio/virtio_pci_modern_dev.c +++ b/drivers/virtio/virtio_pci_modern_dev.c @@ -292,7 +292,7 @@ int vp_modern_probe(struct virtio_pci_modern_device *mdev) mdev->common = vp_modern_map_capability(mdev, common, sizeof(struct virtio_pci_common_cfg), 4, 0, sizeof(struct virtio_pci_modern_common_cfg), - NULL, NULL); + &mdev->common_len, NULL); if (!mdev->common) goto err_map_common; mdev->isr = vp_modern_map_capability(mdev, isr, sizeof(u8), 1, diff --git a/include/linux/virtio_pci_modern.h b/include/linux/virtio_pci_modern.h index 067ac1d789bc..edf62bae0474 100644 --- a/include/linux/virtio_pci_modern.h +++ b/include/linux/virtio_pci_modern.h @@ -28,6 +28,7 @@ struct virtio_pci_modern_device { /* So we can sanity-check accesses. */ size_t notify_len; size_t device_len; + size_t common_len; /* Capability for when we need to map notifications per-vq. */ int notify_map_cap; -- 2.32.0.3.g01195cf9f ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature
On Tue, Oct 10, 2023 at 9:52 AM Akihiko Odaki wrote: > > On 2023/10/09 19:44, Willem de Bruijn wrote: > > On Mon, Oct 9, 2023 at 3:12 AM Akihiko Odaki > > wrote: > >> > >> On 2023/10/09 19:06, Willem de Bruijn wrote: > >>> On Mon, Oct 9, 2023 at 3:02 AM Akihiko Odaki > >>> wrote: > > On 2023/10/09 18:57, Willem de Bruijn wrote: > > On Mon, Oct 9, 2023 at 3:57 AM Akihiko Odaki > > wrote: > >> > >> On 2023/10/09 17:04, Willem de Bruijn wrote: > >>> On Sun, Oct 8, 2023 at 3:46 PM Akihiko Odaki > >>> wrote: > > On 2023/10/09 5:08, Willem de Bruijn wrote: > > On Sun, Oct 8, 2023 at 10:04 PM Akihiko Odaki > > wrote: > >> > >> On 2023/10/09 4:07, Willem de Bruijn wrote: > >>> On Sun, Oct 8, 2023 at 7:22 AM Akihiko Odaki > >>> wrote: > > virtio-net have two usage of hashes: one is RSS and another is > hash > reporting. Conventionally the hash calculation was done by the > VMM. > However, computing the hash after the queue was chosen defeats > the > purpose of RSS. > > Another approach is to use eBPF steering program. This approach > has > another downside: it cannot report the calculated hash due to the > restrictive nature of eBPF. > > Introduce the code to compute hashes to the kernel in order to > overcome > thse challenges. An alternative solution is to extend the eBPF > steering > program so that it will be able to report to the userspace, but > it makes > little sense to allow to implement different hashing algorithms > with > eBPF since the hash value reported by virtio-net is strictly > defined by > the specification. > > The hash value already stored in sk_buff is not used and computed > independently since it may have been computed in a way not > conformant > with the specification. > > Signed-off-by: Akihiko Odaki > --- > >>> > +static const struct tun_vnet_hash_cap tun_vnet_hash_cap = { > + .max_indirection_table_length = > + TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH, > + > + .types = VIRTIO_NET_SUPPORTED_HASH_TYPES > +}; > >>> > >>> No need to have explicit capabilities exchange like this? Tun > >>> either > >>> supports all or none. > >> > >> tun does not support VIRTIO_NET_RSS_HASH_TYPE_IP_EX, > >> VIRTIO_NET_RSS_HASH_TYPE_TCP_EX, and > >> VIRTIO_NET_RSS_HASH_TYPE_UDP_EX. > >> > >> It is because the flow dissector does not support IPv6 extensions. > >> The > >> specification is also vague, and does not tell how many TLVs > >> should be > >> consumed at most when interpreting destination option header so I > >> chose > >> to avoid adding code for these hash types to the flow dissector. I > >> doubt > >> anyone will complain about it since nobody complains for Linux. > >> > >> I'm also adding this so that we can extend it later. > >> max_indirection_table_length may grow for systems with 128+ CPUs, > >> or > >> types may have other bits for new protocols in the future. > >> > >>> > case TUNSETSTEERINGEBPF: > - ret = tun_set_ebpf(tun, &tun->steering_prog, > argp); > + bpf_ret = tun_set_ebpf(tun, &tun->steering_prog, > argp); > + if (IS_ERR(bpf_ret)) > + ret = PTR_ERR(bpf_ret); > + else if (bpf_ret) > + tun->vnet_hash.flags &= > ~TUN_VNET_HASH_RSS; > >>> > >>> Don't make one feature disable another. > >>> > >>> TUNSETSTEERINGEBPF and TUNSETVNETHASH are mutually exclusive > >>> functions. If one is enabled the other call should fail, with > >>> EBUSY > >>> for instance. > >>> > + case TUNSETVNETHASH: > + len = sizeof(vnet_hash); > + if (copy_from_user(&vnet_hash, argp, len)) { > + ret = -EFAULT; > + break; > + } > + > + if (((vnet_hash.flags & TUN_VNET_HASH_REPORT) && > +(tun->vnet_hdr_sz < sizeof(struct
Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature
On Tue, Oct 10, 2023 at 1:51 PM Akihiko Odaki wrote: > > On 2023/10/10 14:45, Jason Wang wrote: > > On Tue, Oct 10, 2023 at 9:52 AM Akihiko Odaki > > wrote: > >> > >> On 2023/10/09 19:44, Willem de Bruijn wrote: > >>> On Mon, Oct 9, 2023 at 3:12 AM Akihiko Odaki > >>> wrote: > > On 2023/10/09 19:06, Willem de Bruijn wrote: > > On Mon, Oct 9, 2023 at 3:02 AM Akihiko Odaki > > wrote: > >> > >> On 2023/10/09 18:57, Willem de Bruijn wrote: > >>> On Mon, Oct 9, 2023 at 3:57 AM Akihiko Odaki > >>> wrote: > > On 2023/10/09 17:04, Willem de Bruijn wrote: > > On Sun, Oct 8, 2023 at 3:46 PM Akihiko Odaki > > wrote: > >> > >> On 2023/10/09 5:08, Willem de Bruijn wrote: > >>> On Sun, Oct 8, 2023 at 10:04 PM Akihiko Odaki > >>> wrote: > > On 2023/10/09 4:07, Willem de Bruijn wrote: > > On Sun, Oct 8, 2023 at 7:22 AM Akihiko Odaki > > wrote: > >> > >> virtio-net have two usage of hashes: one is RSS and another is > >> hash > >> reporting. Conventionally the hash calculation was done by the > >> VMM. > >> However, computing the hash after the queue was chosen defeats > >> the > >> purpose of RSS. > >> > >> Another approach is to use eBPF steering program. This > >> approach has > >> another downside: it cannot report the calculated hash due to > >> the > >> restrictive nature of eBPF. > >> > >> Introduce the code to compute hashes to the kernel in order to > >> overcome > >> thse challenges. An alternative solution is to extend the eBPF > >> steering > >> program so that it will be able to report to the userspace, > >> but it makes > >> little sense to allow to implement different hashing > >> algorithms with > >> eBPF since the hash value reported by virtio-net is strictly > >> defined by > >> the specification. > >> > >> The hash value already stored in sk_buff is not used and > >> computed > >> independently since it may have been computed in a way not > >> conformant > >> with the specification. > >> > >> Signed-off-by: Akihiko Odaki > >> --- > > > >> +static const struct tun_vnet_hash_cap tun_vnet_hash_cap = { > >> + .max_indirection_table_length = > >> + TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH, > >> + > >> + .types = VIRTIO_NET_SUPPORTED_HASH_TYPES > >> +}; > > > > No need to have explicit capabilities exchange like this? Tun > > either > > supports all or none. > > tun does not support VIRTIO_NET_RSS_HASH_TYPE_IP_EX, > VIRTIO_NET_RSS_HASH_TYPE_TCP_EX, and > VIRTIO_NET_RSS_HASH_TYPE_UDP_EX. > > It is because the flow dissector does not support IPv6 > extensions. The > specification is also vague, and does not tell how many TLVs > should be > consumed at most when interpreting destination option header so > I chose > to avoid adding code for these hash types to the flow dissector. > I doubt > anyone will complain about it since nobody complains for Linux. > > I'm also adding this so that we can extend it later. > max_indirection_table_length may grow for systems with 128+ > CPUs, or > types may have other bits for new protocols in the future. > > > > >>case TUNSETSTEERINGEBPF: > >> - ret = tun_set_ebpf(tun, &tun->steering_prog, > >> argp); > >> + bpf_ret = tun_set_ebpf(tun, > >> &tun->steering_prog, argp); > >> + if (IS_ERR(bpf_ret)) > >> + ret = PTR_ERR(bpf_ret); > >> + else if (bpf_ret) > >> + tun->vnet_hash.flags &= > >> ~TUN_VNET_HASH_RSS; > > > > Don't make one feature disable another. > > > > TUNSETSTEERINGEBPF and TUNSETVNETHASH are mutually exclusive > > functions. If one is enabled the other call should fail, with > > EBUSY > > for instance. > > > >> + case TUNSETVNETHASH: > >> + len = sizeof(vnet_hash); > >>
Re: [PATCH vhost v3 10/16] vdpa/mlx5: Allow creation/deletion of any given mr struct
On Mon, Oct 9, 2023 at 7:25 PM Dragos Tatulea wrote: > > This patch adapts the mr creation/deletion code to be able to work with > any given mr struct pointer. All the APIs are adapted to take an extra > parameter for the mr. > > mlx5_vdpa_create/delete_mr doesn't need a ASID parameter anymore. The > check is done in the caller instead (mlx5_set_map). > > This change is needed for a followup patch which will introduce an > additional mr for the vq descriptor data. > > Signed-off-by: Dragos Tatulea > Acked-by: Eugenio Pérez Acked-by: Jason Wang Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH vhost v3 11/16] vdpa/mlx5: Move mr mutex out of mr struct
On Mon, Oct 9, 2023 at 7:25 PM Dragos Tatulea wrote: > > The mutex is named like it is supposed to protect only the mkey but in > reality it is a global lock for all mr resources. > > Shift the mutex to it's rightful location (struct mlx5_vdpa_dev) and > give it a more appropriate name. > > Signed-off-by: Dragos Tatulea > Acked-by: Eugenio Pérez Acked-by: Jason Wang Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH vhost v3 12/16] vdpa/mlx5: Improve mr update flow
On Mon, Oct 9, 2023 at 7:25 PM Dragos Tatulea wrote: > > The current flow for updating an mr works directly on mvdev->mr which > makes it cumbersome to handle multiple new mr structs. > > This patch makes the flow more straightforward by having > mlx5_vdpa_create_mr return a new mr which will update the old mr (if > any). The old mr will be deleted and unlinked from mvdev. > > This change paves the way for adding mrs for different ASIDs. > > The initialized bool is no longer needed as mr is now a pointer in the > mlx5_vdpa_dev struct which will be NULL when not initialized. > > Acked-by: Eugenio Pérez > Signed-off-by: Dragos Tatulea > --- Acked-by: Jason Wang Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH vhost v3 13/16] vdpa/mlx5: Introduce mr for vq descriptor
On Mon, Oct 9, 2023 at 7:25 PM Dragos Tatulea wrote: > > Introduce the vq descriptor group and mr per ASID. Until now > .set_map on ASID 1 was only updating the cvq iotlb. From now on it also > creates a mkey for it. The current patch doesn't use it but follow-up > patches will add hardware support for mapping the vq descriptors. > > Acked-by: Eugenio Pérez > Signed-off-by: Dragos Tatulea > --- Acked-by: Jason Wang Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH vhost v3 14/16] vdpa/mlx5: Enable hw support for vq descriptor mapping
On Mon, Oct 9, 2023 at 7:25 PM Dragos Tatulea wrote: > > Vq descriptor mappings are supported in hardware by filling in an > additional mkey which contains the descriptor mappings to the hw vq. > > A previous patch in this series added support for hw mkey (mr) creation > for ASID 1. > > This patch fills in both the vq data and vq descriptor mkeys based on > group ASID mapping. > > The feature is signaled to the vdpa core through the presence of the > .get_vq_desc_group op. > > Acked-by: Eugenio Pérez > Signed-off-by: Dragos Tatulea Acked-by: Jason Wang Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH vhost v3 15/16] vdpa/mlx5: Make iotlb helper functions more generic
On Mon, Oct 9, 2023 at 7:25 PM Dragos Tatulea wrote: > > They will be used in a follow-up patch. > > For dup_iotlb, avoid the src == dst case. This is an error. > > Acked-by: Eugenio Pérez > Signed-off-by: Dragos Tatulea Acked-by: Jason Wang Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH vhost v3 16/16] vdpa/mlx5: Update cvq iotlb mapping on ASID change
On Mon, Oct 9, 2023 at 7:25 PM Dragos Tatulea wrote: > > For the following sequence: > - cvq group is in ASID 0 > - .set_map(1, cvq_iotlb) > - .set_group_asid(cvq_group, 1) > > ... the cvq mapping from ASID 0 will be used. This is not always correct > behaviour. > > This patch adds support for the above mentioned flow by saving the iotlb > on each .set_map and updating the cvq iotlb with it on a cvq group change. > > Acked-by: Eugenio Pérez > Signed-off-by: Dragos Tatulea > --- Acked-by: Jason Wang Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH vhost v3 3/4] virtio_pci: add check for common cfg size
On Tue, Oct 10, 2023 at 11:11 AM Xuan Zhuo wrote: > > Some buggy devices, the common cfg size may not match the features. > > This patch checks the common cfg size for the > features(VIRTIO_F_NOTIF_CONFIG_DATA, VIRTIO_F_RING_RESET). When the > common cfg size does not match the corresponding feature, we fail the > probe and print error message. > > Signed-off-by: Xuan Zhuo > --- > drivers/virtio/virtio_pci_modern.c | 36 ++ > drivers/virtio/virtio_pci_modern_dev.c | 2 +- > include/linux/virtio_pci_modern.h | 1 + > 3 files changed, 38 insertions(+), 1 deletion(-) > > diff --git a/drivers/virtio/virtio_pci_modern.c > b/drivers/virtio/virtio_pci_modern.c > index d6bb68ba84e5..6a8f5ff05636 100644 > --- a/drivers/virtio/virtio_pci_modern.c > +++ b/drivers/virtio/virtio_pci_modern.c > @@ -39,6 +39,39 @@ static void vp_transport_features(struct virtio_device > *vdev, u64 features) > __virtio_set_bit(vdev, VIRTIO_F_RING_RESET); > } > > +static int __vp_check_common_size_one_feature(struct virtio_device *vdev, > u32 fbit, > + u32 offset, const char *fname) > +{ > + struct virtio_pci_device *vp_dev = to_vp_device(vdev); > + > + if (!__virtio_test_bit(vdev, fbit)) > + return 0; > + > + if (likely(vp_dev->mdev.common_len >= offset)) > + return 0; > + > + dev_err(&vdev->dev, > + "virtio: common cfg size(%ld) does not match the feature > %s\n", > + vp_dev->mdev.common_len, fname); > + > + return -EINVAL; > +} > + > +#define vp_check_common_size_one_feature(vdev, fbit, field) \ > + __vp_check_common_size_one_feature(vdev, fbit, \ > + offsetofend(struct virtio_pci_modern_common_cfg, field), > #fbit) > + > +static int vp_check_common_size(struct virtio_device *vdev) > +{ > + if (vp_check_common_size_one_feature(vdev, > VIRTIO_F_NOTIF_CONFIG_DATA, queue_notify_data)) > + return -EINVAL; > + > + if (vp_check_common_size_one_feature(vdev, VIRTIO_F_RING_RESET, > queue_reset)) > + return -EINVAL; Do we need to at least check the offset of the queue_device as well here? Thanks > + > + return 0; > +} > + > /* virtio config->finalize_features() implementation */ > static int vp_finalize_features(struct virtio_device *vdev) > { > @@ -57,6 +90,9 @@ static int vp_finalize_features(struct virtio_device *vdev) > return -EINVAL; > } > > + if (vp_check_common_size(vdev)) > + return -EINVAL; > + > vp_modern_set_features(&vp_dev->mdev, vdev->features); > > return 0; > diff --git a/drivers/virtio/virtio_pci_modern_dev.c > b/drivers/virtio/virtio_pci_modern_dev.c > index 9cb601e16688..33f319da1558 100644 > --- a/drivers/virtio/virtio_pci_modern_dev.c > +++ b/drivers/virtio/virtio_pci_modern_dev.c > @@ -292,7 +292,7 @@ int vp_modern_probe(struct virtio_pci_modern_device *mdev) > mdev->common = vp_modern_map_capability(mdev, common, > sizeof(struct virtio_pci_common_cfg), 4, > 0, sizeof(struct > virtio_pci_modern_common_cfg), > - NULL, NULL); > + &mdev->common_len, NULL); > if (!mdev->common) > goto err_map_common; > mdev->isr = vp_modern_map_capability(mdev, isr, sizeof(u8), 1, > diff --git a/include/linux/virtio_pci_modern.h > b/include/linux/virtio_pci_modern.h > index 067ac1d789bc..edf62bae0474 100644 > --- a/include/linux/virtio_pci_modern.h > +++ b/include/linux/virtio_pci_modern.h > @@ -28,6 +28,7 @@ struct virtio_pci_modern_device { > /* So we can sanity-check accesses. */ > size_t notify_len; > size_t device_len; > + size_t common_len; > > /* Capability for when we need to map notifications per-vq. */ > int notify_map_cap; > -- > 2.32.0.3.g01195cf9f > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH vhost v3 2/4] virtio_pci: fix the common cfg map size
On Tue, Oct 10, 2023 at 11:11 AM Xuan Zhuo wrote: > > The function vp_modern_map_capability() takes the size parameter, > which corresponds to the size of virtio_pci_common_cfg. As a result, > this indicates the size of memory area to map. > > Now the size is the size of virtio_pci_common_cfg, but some feature(such > as the _F_RING_RESET) needs the virtio_pci_modern_common_cfg, so this > commit changes the size to the size of virtio_pci_modern_common_cfg. > > Fixes: 0b50cece0b78 ("virtio_pci: introduce helper to get/set queue reset") > Signed-off-by: Xuan Zhuo Acked-by: Jason Wang Thanks > --- > drivers/virtio/virtio_pci_modern_dev.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c > b/drivers/virtio/virtio_pci_modern_dev.c > index aad7d9296e77..9cb601e16688 100644 > --- a/drivers/virtio/virtio_pci_modern_dev.c > +++ b/drivers/virtio/virtio_pci_modern_dev.c > @@ -291,7 +291,7 @@ int vp_modern_probe(struct virtio_pci_modern_device *mdev) > err = -EINVAL; > mdev->common = vp_modern_map_capability(mdev, common, > sizeof(struct virtio_pci_common_cfg), 4, > - 0, sizeof(struct virtio_pci_common_cfg), > + 0, sizeof(struct > virtio_pci_modern_common_cfg), > NULL, NULL); > if (!mdev->common) > goto err_map_common; > -- > 2.32.0.3.g01195cf9f > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH vhost v3 1/4] virtio: add definition of VIRTIO_F_NOTIF_CONFIG_DATA feature bit
On Tue, Oct 10, 2023 at 11:11 AM Xuan Zhuo wrote: > > This patch adds the definition of VIRTIO_F_NOTIF_CONFIG_DATA feature bit > in the relevant header file. > > This feature indicates that the driver uses the data provided by the > device as a virtqueue identifier in available buffer notifications. > > It comes from here: > https://github.com/oasis-tcs/virtio-spec/issues/89 > > Signed-off-by: Xuan Zhuo Acked-by: Jason Wang Thanks > --- > include/uapi/linux/virtio_config.h | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/include/uapi/linux/virtio_config.h > b/include/uapi/linux/virtio_config.h > index 2c712c654165..8881aea60f6f 100644 > --- a/include/uapi/linux/virtio_config.h > +++ b/include/uapi/linux/virtio_config.h > @@ -105,6 +105,11 @@ > */ > #define VIRTIO_F_NOTIFICATION_DATA 38 > > +/* This feature indicates that the driver uses the data provided by the > device > + * as a virtqueue identifier in available buffer notifications. > + */ > +#define VIRTIO_F_NOTIF_CONFIG_DATA 39 > + > /* > * This feature indicates that the driver can reset a queue individually. > */ > -- > 2.32.0.3.g01195cf9f > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/2] virtio-mmio: Support multiple interrupts per device
On Sat, Sep 30, 2023 at 4:46 AM Jakub Sitnicki wrote: > > Some virtual devices, such as the virtio network device, can use multiple > virtqueues (or multiple pairs of virtqueues in the case of a vNIC). In such > case, when there are multiple vCPUs present, it is possible to process > virtqueue events in parallel. Each vCPU can service a subset of all > virtqueues when notified that there is work to carry out. > > However, the current virtio-mmio transport implementation poses a > limitation. Only one vCPU can service notifications from any of the > virtqueues of a single virtio device. This is because a virtio-mmio device > driver supports registering just one interrupt per device. With such setup > we are not able to scale virtqueue event processing among vCPUs. > > Now, with more than one IRQ resource registered for a virtio-mmio platform > device, we can address this limitation. > > First, we request multiple IRQs when creating virtqueues for a device. > > Then, map each virtqueue to one of the IRQs assigned to the device. The > mapping is done in a device type specific manner. For instance, a network > device will want each RX/TX virtqueue pair mapped to a different IRQ > line. Other device types might require a different mapping scheme. We > currently provide a mapping for virtio-net device type. > > Finally, when handling an interrupt, we service only the virtqueues > associated with the IRQ line that triggered the event. > > Signed-off-by: Jakub Sitnicki > --- > drivers/virtio/virtio_mmio.c | 102 > +++ > 1 file changed, 83 insertions(+), 19 deletions(-) > > diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c > index 06a587b23542..180c51c27704 100644 > --- a/drivers/virtio/virtio_mmio.c > +++ b/drivers/virtio/virtio_mmio.c > @@ -69,6 +69,7 @@ > #include > #include > #include > +#include > #include > #include > > @@ -93,6 +94,10 @@ struct virtio_mmio_device { > /* a list of queues so we can dispatch IRQs */ > spinlock_t lock; > struct list_head virtqueues; > + > + /* IRQ range allocated to the device */ > + unsigned int irq_base; > + unsigned int num_irqs; > }; > > struct virtio_mmio_vq_info { > @@ -101,6 +106,9 @@ struct virtio_mmio_vq_info { > > /* the list node for the virtqueues list */ > struct list_head node; > + > + /* IRQ mapped to virtqueue */ > + unsigned int irq; > }; > > > @@ -297,7 +305,7 @@ static bool vm_notify_with_data(struct virtqueue *vq) > return true; > } > > -/* Notify all virtqueues on an interrupt. */ > +/* Notify all or some virtqueues on an interrupt. */ > static irqreturn_t vm_interrupt(int irq, void *opaque) > { > struct virtio_mmio_device *vm_dev = opaque; > @@ -308,20 +316,31 @@ static irqreturn_t vm_interrupt(int irq, void *opaque) > > /* Read and acknowledge interrupts */ > status = readl(vm_dev->base + VIRTIO_MMIO_INTERRUPT_STATUS); > - writel(status, vm_dev->base + VIRTIO_MMIO_INTERRUPT_ACK); > > if (unlikely(status & VIRTIO_MMIO_INT_CONFIG)) { > + writel(status & VIRTIO_MMIO_INT_CONFIG, vm_dev->base + > VIRTIO_MMIO_INTERRUPT_ACK); > virtio_config_changed(&vm_dev->vdev); > ret = IRQ_HANDLED; > } > > if (likely(status & VIRTIO_MMIO_INT_VRING)) { > + writel(status & VIRTIO_MMIO_INT_VRING, vm_dev->base + > VIRTIO_MMIO_INTERRUPT_ACK); > spin_lock_irqsave(&vm_dev->lock, flags); > list_for_each_entry(info, &vm_dev->virtqueues, node) > ret |= vring_interrupt(irq, info->vq); > spin_unlock_irqrestore(&vm_dev->lock, flags); > } > > + /* Notify only affected vrings if device uses multiple interrupts */ > + if (vm_dev->num_irqs > 1) { > + spin_lock_irqsave(&vm_dev->lock, flags); > + list_for_each_entry(info, &vm_dev->virtqueues, node) { > + if (info->irq == irq) > + ret |= vring_interrupt(irq, info->vq); > + } > + spin_unlock_irqrestore(&vm_dev->lock, flags); > + } > + > return ret; > } > > @@ -356,11 +375,15 @@ static void vm_del_vqs(struct virtio_device *vdev) > { > struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev); > struct virtqueue *vq, *n; > + int i, irq; > + > + for (i = 0; i < vm_dev->num_irqs; i++) { > + irq = vm_dev->irq_base + i; > + devm_free_irq(&vdev->dev, irq, vm_dev); > + } > > list_for_each_entry_safe(vq, n, &vdev->vqs, list) > vm_del_vq(vq); > - > - free_irq(platform_get_irq(vm_dev->pdev, 0), vm_dev); > } > > static void vm_synchronize_cbs(struct virtio_device *vdev) > @@ -488,6 +511,18 @@ static struct virtqueue *vm_setup_vq(struct > virtio_device *vdev, unsigned int in