date:20231009

On Sun, Oct 8, 2023 at 3:46 PM Akihiko Odaki  wrote:
>
> On 2023/10/09 5:08, Willem de Bruijn wrote:
> > On Sun, Oct 8, 2023 at 10:04 PM Akihiko Odaki  
> > wrote:
> >>
> >> On 2023/10/09 4:07, Willem de Bruijn wrote:
> >>> On Sun, Oct 8, 2023 at 7:22 AM Akihiko Odaki  
> >>> wrote:
> 
>  virtio-net have two usage of hashes: one is RSS and another is hash
>  reporting. Conventionally the hash calculation was done by the VMM.
>  However, computing the hash after the queue was chosen defeats the
>  purpose of RSS.
> 
>  Another approach is to use eBPF steering program. This approach has
>  another downside: it cannot report the calculated hash due to the
>  restrictive nature of eBPF.
> 
>  Introduce the code to compute hashes to the kernel in order to overcome
>  thse challenges. An alternative solution is to extend the eBPF steering
>  program so that it will be able to report to the userspace, but it makes
>  little sense to allow to implement different hashing algorithms with
>  eBPF since the hash value reported by virtio-net is strictly defined by
>  the specification.
> 
>  The hash value already stored in sk_buff is not used and computed
>  independently since it may have been computed in a way not conformant
>  with the specification.
> 
>  Signed-off-by: Akihiko Odaki 
>  ---
> >>>
>  +static const struct tun_vnet_hash_cap tun_vnet_hash_cap = {
>  +   .max_indirection_table_length =
>  +   TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH,
>  +
>  +   .types = VIRTIO_NET_SUPPORTED_HASH_TYPES
>  +};
> >>>
> >>> No need to have explicit capabilities exchange like this? Tun either
> >>> supports all or none.
> >>
> >> tun does not support VIRTIO_NET_RSS_HASH_TYPE_IP_EX,
> >> VIRTIO_NET_RSS_HASH_TYPE_TCP_EX, and VIRTIO_NET_RSS_HASH_TYPE_UDP_EX.
> >>
> >> It is because the flow dissector does not support IPv6 extensions. The
> >> specification is also vague, and does not tell how many TLVs should be
> >> consumed at most when interpreting destination option header so I chose
> >> to avoid adding code for these hash types to the flow dissector. I doubt
> >> anyone will complain about it since nobody complains for Linux.
> >>
> >> I'm also adding this so that we can extend it later.
> >> max_indirection_table_length may grow for systems with 128+ CPUs, or
> >> types may have other bits for new protocols in the future.
> >>
> >>>
>    case TUNSETSTEERINGEBPF:
>  -   ret = tun_set_ebpf(tun, &tun->steering_prog, argp);
>  +   bpf_ret = tun_set_ebpf(tun, &tun->steering_prog, argp);
>  +   if (IS_ERR(bpf_ret))
>  +   ret = PTR_ERR(bpf_ret);
>  +   else if (bpf_ret)
>  +   tun->vnet_hash.flags &= ~TUN_VNET_HASH_RSS;
> >>>
> >>> Don't make one feature disable another.
> >>>
> >>> TUNSETSTEERINGEBPF and TUNSETVNETHASH are mutually exclusive
> >>> functions. If one is enabled the other call should fail, with EBUSY
> >>> for instance.
> >>>
>  +   case TUNSETVNETHASH:
>  +   len = sizeof(vnet_hash);
>  +   if (copy_from_user(&vnet_hash, argp, len)) {
>  +   ret = -EFAULT;
>  +   break;
>  +   }
>  +
>  +   if (((vnet_hash.flags & TUN_VNET_HASH_REPORT) &&
>  +(tun->vnet_hdr_sz < sizeof(struct 
>  virtio_net_hdr_v1_hash) ||
>  + !tun_is_little_endian(tun))) ||
>  +vnet_hash.indirection_table_mask >=
>  +TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH) {
>  +   ret = -EINVAL;
>  +   break;
>  +   }
>  +
>  +   argp = (u8 __user *)argp + len;
>  +   len = (vnet_hash.indirection_table_mask + 1) * 2;
>  +   if (copy_from_user(vnet_hash_indirection_table, argp, 
>  len)) {
>  +   ret = -EFAULT;
>  +   break;
>  +   }
>  +
>  +   argp = (u8 __user *)argp + len;
>  +   len = virtio_net_hash_key_length(vnet_hash.types);
>  +
>  +   if (copy_from_user(vnet_hash_key, argp, len)) {
>  +   ret = -EFAULT;
>  +   break;
>  +   }
> >>>
> >>> Probably easier and less error-prone to define a fixed size control
> >>> struct with the max indirection table size.
> >>
> >> I made its size variable because the indirection table and key may grow
> >> in the future as I wrote above.
> >>
> >>>
> >>> Btw: please trim the CC: list considerably on future patches.
> >>
> >> I'll do so in the next version with the TUNSETSTEERINGEBPF change you
> >> proposed.
> >
> > To be clear: pleas

Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature

On Sun, Oct 8, 2023 at 12:22 AM Akihiko Odaki  wrote:
>
> virtio-net have two usage of hashes: one is RSS and another is hash
> reporting. Conventionally the hash calculation was done by the VMM.
> However, computing the hash after the queue was chosen defeats the
> purpose of RSS.
>
> Another approach is to use eBPF steering program. This approach has
> another downside: it cannot report the calculated hash due to the
> restrictive nature of eBPF.
>
> Introduce the code to compute hashes to the kernel in order to overcome
> thse challenges. An alternative solution is to extend the eBPF steering
> program so that it will be able to report to the userspace, but it makes
> little sense to allow to implement different hashing algorithms with
> eBPF since the hash value reported by virtio-net is strictly defined by
> the specification.
>
> The hash value already stored in sk_buff is not used and computed
> independently since it may have been computed in a way not conformant
> with the specification.
>
> Signed-off-by: Akihiko Odaki 

> @@ -2116,31 +2172,49 @@ static ssize_t tun_put_user(struct tun_struct *tun,
> }
>
> if (vnet_hdr_sz) {
> -   struct virtio_net_hdr gso;
> +   union {
> +   struct virtio_net_hdr hdr;
> +   struct virtio_net_hdr_v1_hash v1_hash_hdr;
> +   } hdr;
> +   int ret;
>
> if (iov_iter_count(iter) < vnet_hdr_sz)
> return -EINVAL;
>
> -   if (virtio_net_hdr_from_skb(skb, &gso,
> -   tun_is_little_endian(tun), true,
> -   vlan_hlen)) {
> +   if ((READ_ONCE(tun->vnet_hash.flags) & TUN_VNET_HASH_REPORT) 
> &&
> +   vnet_hdr_sz >= sizeof(hdr.v1_hash_hdr) &&
> +   skb->tun_vnet_hash) {

Isn't vnet_hdr_sz guaranteed to be >= hdr.v1_hash_hdr, by virtue of
the set hash ioctl failing otherwise?

Such checks should be limited to control path where possible

> +   vnet_hdr_content_sz = sizeof(hdr.v1_hash_hdr);
> +   ret = virtio_net_hdr_v1_hash_from_skb(skb,
> + 
> &hdr.v1_hash_hdr,
> + true,
> + vlan_hlen,
> + &vnet_hash);
> +   } else {
> +   vnet_hdr_content_sz = sizeof(hdr.hdr);
> +   ret = virtio_net_hdr_from_skb(skb, &hdr.hdr,
> + 
> tun_is_little_endian(tun),
> + true, vlan_hlen);
> +   }
> +
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature

On Mon, Oct 9, 2023 at 3:44 AM Akihiko Odaki  wrote:
>
> On 2023/10/09 17:13, Willem de Bruijn wrote:
> > On Sun, Oct 8, 2023 at 12:22 AM Akihiko Odaki  
> > wrote:
> >>
> >> virtio-net have two usage of hashes: one is RSS and another is hash
> >> reporting. Conventionally the hash calculation was done by the VMM.
> >> However, computing the hash after the queue was chosen defeats the
> >> purpose of RSS.
> >>
> >> Another approach is to use eBPF steering program. This approach has
> >> another downside: it cannot report the calculated hash due to the
> >> restrictive nature of eBPF.
> >>
> >> Introduce the code to compute hashes to the kernel in order to overcome
> >> thse challenges. An alternative solution is to extend the eBPF steering
> >> program so that it will be able to report to the userspace, but it makes
> >> little sense to allow to implement different hashing algorithms with
> >> eBPF since the hash value reported by virtio-net is strictly defined by
> >> the specification.
> >>
> >> The hash value already stored in sk_buff is not used and computed
> >> independently since it may have been computed in a way not conformant
> >> with the specification.
> >>
> >> Signed-off-by: Akihiko Odaki 
> >
> >> @@ -2116,31 +2172,49 @@ static ssize_t tun_put_user(struct tun_struct *tun,
> >>  }
> >>
> >>  if (vnet_hdr_sz) {
> >> -   struct virtio_net_hdr gso;
> >> +   union {
> >> +   struct virtio_net_hdr hdr;
> >> +   struct virtio_net_hdr_v1_hash v1_hash_hdr;
> >> +   } hdr;
> >> +   int ret;
> >>
> >>  if (iov_iter_count(iter) < vnet_hdr_sz)
> >>  return -EINVAL;
> >>
> >> -   if (virtio_net_hdr_from_skb(skb, &gso,
> >> -   tun_is_little_endian(tun), 
> >> true,
> >> -   vlan_hlen)) {
> >> +   if ((READ_ONCE(tun->vnet_hash.flags) & 
> >> TUN_VNET_HASH_REPORT) &&
> >> +   vnet_hdr_sz >= sizeof(hdr.v1_hash_hdr) &&
> >> +   skb->tun_vnet_hash) {
> >
> > Isn't vnet_hdr_sz guaranteed to be >= hdr.v1_hash_hdr, by virtue of
> > the set hash ioctl failing otherwise?
> >
> > Such checks should be limited to control path where possible
>
> There is a potential race since tun->vnet_hash.flags and vnet_hdr_sz are
> not read at once.

It should not be possible to downgrade the hdr_sz once v1 is selected.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature

On Mon, Oct 9, 2023 at 3:57 AM Akihiko Odaki  wrote:
>
> On 2023/10/09 17:04, Willem de Bruijn wrote:
> > On Sun, Oct 8, 2023 at 3:46 PM Akihiko Odaki  
> > wrote:
> >>
> >> On 2023/10/09 5:08, Willem de Bruijn wrote:
> >>> On Sun, Oct 8, 2023 at 10:04 PM Akihiko Odaki  
> >>> wrote:
> 
>  On 2023/10/09 4:07, Willem de Bruijn wrote:
> > On Sun, Oct 8, 2023 at 7:22 AM Akihiko Odaki  
> > wrote:
> >>
> >> virtio-net have two usage of hashes: one is RSS and another is hash
> >> reporting. Conventionally the hash calculation was done by the VMM.
> >> However, computing the hash after the queue was chosen defeats the
> >> purpose of RSS.
> >>
> >> Another approach is to use eBPF steering program. This approach has
> >> another downside: it cannot report the calculated hash due to the
> >> restrictive nature of eBPF.
> >>
> >> Introduce the code to compute hashes to the kernel in order to overcome
> >> thse challenges. An alternative solution is to extend the eBPF steering
> >> program so that it will be able to report to the userspace, but it 
> >> makes
> >> little sense to allow to implement different hashing algorithms with
> >> eBPF since the hash value reported by virtio-net is strictly defined by
> >> the specification.
> >>
> >> The hash value already stored in sk_buff is not used and computed
> >> independently since it may have been computed in a way not conformant
> >> with the specification.
> >>
> >> Signed-off-by: Akihiko Odaki 
> >> ---
> >
> >> +static const struct tun_vnet_hash_cap tun_vnet_hash_cap = {
> >> +   .max_indirection_table_length =
> >> +   TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH,
> >> +
> >> +   .types = VIRTIO_NET_SUPPORTED_HASH_TYPES
> >> +};
> >
> > No need to have explicit capabilities exchange like this? Tun either
> > supports all or none.
> 
>  tun does not support VIRTIO_NET_RSS_HASH_TYPE_IP_EX,
>  VIRTIO_NET_RSS_HASH_TYPE_TCP_EX, and VIRTIO_NET_RSS_HASH_TYPE_UDP_EX.
> 
>  It is because the flow dissector does not support IPv6 extensions. The
>  specification is also vague, and does not tell how many TLVs should be
>  consumed at most when interpreting destination option header so I chose
>  to avoid adding code for these hash types to the flow dissector. I doubt
>  anyone will complain about it since nobody complains for Linux.
> 
>  I'm also adding this so that we can extend it later.
>  max_indirection_table_length may grow for systems with 128+ CPUs, or
>  types may have other bits for new protocols in the future.
> 
> >
> >>case TUNSETSTEERINGEBPF:
> >> -   ret = tun_set_ebpf(tun, &tun->steering_prog, argp);
> >> +   bpf_ret = tun_set_ebpf(tun, &tun->steering_prog, argp);
> >> +   if (IS_ERR(bpf_ret))
> >> +   ret = PTR_ERR(bpf_ret);
> >> +   else if (bpf_ret)
> >> +   tun->vnet_hash.flags &= ~TUN_VNET_HASH_RSS;
> >
> > Don't make one feature disable another.
> >
> > TUNSETSTEERINGEBPF and TUNSETVNETHASH are mutually exclusive
> > functions. If one is enabled the other call should fail, with EBUSY
> > for instance.
> >
> >> +   case TUNSETVNETHASH:
> >> +   len = sizeof(vnet_hash);
> >> +   if (copy_from_user(&vnet_hash, argp, len)) {
> >> +   ret = -EFAULT;
> >> +   break;
> >> +   }
> >> +
> >> +   if (((vnet_hash.flags & TUN_VNET_HASH_REPORT) &&
> >> +(tun->vnet_hdr_sz < sizeof(struct 
> >> virtio_net_hdr_v1_hash) ||
> >> + !tun_is_little_endian(tun))) ||
> >> +vnet_hash.indirection_table_mask >=
> >> +TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH) {
> >> +   ret = -EINVAL;
> >> +   break;
> >> +   }
> >> +
> >> +   argp = (u8 __user *)argp + len;
> >> +   len = (vnet_hash.indirection_table_mask + 1) * 2;
> >> +   if (copy_from_user(vnet_hash_indirection_table, argp, 
> >> len)) {
> >> +   ret = -EFAULT;
> >> +   break;
> >> +   }
> >> +
> >> +   argp = (u8 __user *)argp + len;
> >> +   len = virtio_net_hash_key_length(vnet_hash.types);
> >> +
> >> +   if (copy_from_user(vnet_hash_key, argp, len)) {
> >> +   ret = -EFAULT;
> >> +   break;
> >> +   }
> >
> > Probably easier and less error-prone to define a fixed size control
> > struct with the max indirecti

Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature

On Mon, Oct 9, 2023 at 3:02 AM Akihiko Odaki  wrote:
>
> On 2023/10/09 18:57, Willem de Bruijn wrote:
> > On Mon, Oct 9, 2023 at 3:57 AM Akihiko Odaki  
> > wrote:
> >>
> >> On 2023/10/09 17:04, Willem de Bruijn wrote:
> >>> On Sun, Oct 8, 2023 at 3:46 PM Akihiko Odaki  
> >>> wrote:
> 
>  On 2023/10/09 5:08, Willem de Bruijn wrote:
> > On Sun, Oct 8, 2023 at 10:04 PM Akihiko Odaki 
> >  wrote:
> >>
> >> On 2023/10/09 4:07, Willem de Bruijn wrote:
> >>> On Sun, Oct 8, 2023 at 7:22 AM Akihiko Odaki 
> >>>  wrote:
> 
>  virtio-net have two usage of hashes: one is RSS and another is hash
>  reporting. Conventionally the hash calculation was done by the VMM.
>  However, computing the hash after the queue was chosen defeats the
>  purpose of RSS.
> 
>  Another approach is to use eBPF steering program. This approach has
>  another downside: it cannot report the calculated hash due to the
>  restrictive nature of eBPF.
> 
>  Introduce the code to compute hashes to the kernel in order to 
>  overcome
>  thse challenges. An alternative solution is to extend the eBPF 
>  steering
>  program so that it will be able to report to the userspace, but it 
>  makes
>  little sense to allow to implement different hashing algorithms with
>  eBPF since the hash value reported by virtio-net is strictly defined 
>  by
>  the specification.
> 
>  The hash value already stored in sk_buff is not used and computed
>  independently since it may have been computed in a way not conformant
>  with the specification.
> 
>  Signed-off-by: Akihiko Odaki 
>  ---
> >>>
>  +static const struct tun_vnet_hash_cap tun_vnet_hash_cap = {
>  +   .max_indirection_table_length =
>  +   TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH,
>  +
>  +   .types = VIRTIO_NET_SUPPORTED_HASH_TYPES
>  +};
> >>>
> >>> No need to have explicit capabilities exchange like this? Tun either
> >>> supports all or none.
> >>
> >> tun does not support VIRTIO_NET_RSS_HASH_TYPE_IP_EX,
> >> VIRTIO_NET_RSS_HASH_TYPE_TCP_EX, and VIRTIO_NET_RSS_HASH_TYPE_UDP_EX.
> >>
> >> It is because the flow dissector does not support IPv6 extensions. The
> >> specification is also vague, and does not tell how many TLVs should be
> >> consumed at most when interpreting destination option header so I chose
> >> to avoid adding code for these hash types to the flow dissector. I 
> >> doubt
> >> anyone will complain about it since nobody complains for Linux.
> >>
> >> I'm also adding this so that we can extend it later.
> >> max_indirection_table_length may grow for systems with 128+ CPUs, or
> >> types may have other bits for new protocols in the future.
> >>
> >>>
>  case TUNSETSTEERINGEBPF:
>  -   ret = tun_set_ebpf(tun, &tun->steering_prog, argp);
>  +   bpf_ret = tun_set_ebpf(tun, &tun->steering_prog, 
>  argp);
>  +   if (IS_ERR(bpf_ret))
>  +   ret = PTR_ERR(bpf_ret);
>  +   else if (bpf_ret)
>  +   tun->vnet_hash.flags &= ~TUN_VNET_HASH_RSS;
> >>>
> >>> Don't make one feature disable another.
> >>>
> >>> TUNSETSTEERINGEBPF and TUNSETVNETHASH are mutually exclusive
> >>> functions. If one is enabled the other call should fail, with EBUSY
> >>> for instance.
> >>>
>  +   case TUNSETVNETHASH:
>  +   len = sizeof(vnet_hash);
>  +   if (copy_from_user(&vnet_hash, argp, len)) {
>  +   ret = -EFAULT;
>  +   break;
>  +   }
>  +
>  +   if (((vnet_hash.flags & TUN_VNET_HASH_REPORT) &&
>  +(tun->vnet_hdr_sz < sizeof(struct 
>  virtio_net_hdr_v1_hash) ||
>  + !tun_is_little_endian(tun))) ||
>  +vnet_hash.indirection_table_mask >=
>  +TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH) {
>  +   ret = -EINVAL;
>  +   break;
>  +   }
>  +
>  +   argp = (u8 __user *)argp + len;
>  +   len = (vnet_hash.indirection_table_mask + 1) * 2;
>  +   if (copy_from_user(vnet_hash_indirection_table, 
>  argp, len)) {
>  +   ret = -EFAULT;
>  +   break;
>  +   }
>  +
>  +   argp = (u8 __user *)argp + len;
>  +

Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature

On Mon, Oct 9, 2023 at 3:05 AM Akihiko Odaki  wrote:
>
>
>
> On 2023/10/09 18:54, Willem de Bruijn wrote:
> > On Mon, Oct 9, 2023 at 3:44 AM Akihiko Odaki  
> > wrote:
> >>
> >> On 2023/10/09 17:13, Willem de Bruijn wrote:
> >>> On Sun, Oct 8, 2023 at 12:22 AM Akihiko Odaki  
> >>> wrote:
> 
>  virtio-net have two usage of hashes: one is RSS and another is hash
>  reporting. Conventionally the hash calculation was done by the VMM.
>  However, computing the hash after the queue was chosen defeats the
>  purpose of RSS.
> 
>  Another approach is to use eBPF steering program. This approach has
>  another downside: it cannot report the calculated hash due to the
>  restrictive nature of eBPF.
> 
>  Introduce the code to compute hashes to the kernel in order to overcome
>  thse challenges. An alternative solution is to extend the eBPF steering
>  program so that it will be able to report to the userspace, but it makes
>  little sense to allow to implement different hashing algorithms with
>  eBPF since the hash value reported by virtio-net is strictly defined by
>  the specification.
> 
>  The hash value already stored in sk_buff is not used and computed
>  independently since it may have been computed in a way not conformant
>  with the specification.
> 
>  Signed-off-by: Akihiko Odaki 
> >>>
>  @@ -2116,31 +2172,49 @@ static ssize_t tun_put_user(struct tun_struct 
>  *tun,
>    }
> 
>    if (vnet_hdr_sz) {
>  -   struct virtio_net_hdr gso;
>  +   union {
>  +   struct virtio_net_hdr hdr;
>  +   struct virtio_net_hdr_v1_hash v1_hash_hdr;
>  +   } hdr;
>  +   int ret;
> 
>    if (iov_iter_count(iter) < vnet_hdr_sz)
>    return -EINVAL;
> 
>  -   if (virtio_net_hdr_from_skb(skb, &gso,
>  -   tun_is_little_endian(tun), 
>  true,
>  -   vlan_hlen)) {
>  +   if ((READ_ONCE(tun->vnet_hash.flags) & 
>  TUN_VNET_HASH_REPORT) &&
>  +   vnet_hdr_sz >= sizeof(hdr.v1_hash_hdr) &&
>  +   skb->tun_vnet_hash) {
> >>>
> >>> Isn't vnet_hdr_sz guaranteed to be >= hdr.v1_hash_hdr, by virtue of
> >>> the set hash ioctl failing otherwise?
> >>>
> >>> Such checks should be limited to control path where possible
> >>
> >> There is a potential race since tun->vnet_hash.flags and vnet_hdr_sz are
> >> not read at once.
> >
> > It should not be possible to downgrade the hdr_sz once v1 is selected.
>
> I see nothing that prevents shrinking the header size.
>
> tun->vnet_hash.flags is read after vnet_hdr_sz so the race can happen
> even for the case the header size grows though this can be fixed by
> reordering the two reads.

One option is to fail any control path that tries to re-negotiate
header size once this hash option is enabled?

There is no practical reason to allow feature re-negotiation at any
arbitrary time.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature

On Mon, Oct 9, 2023 at 3:11 AM Akihiko Odaki  wrote:
>
> On 2023/10/09 19:07, Willem de Bruijn wrote:
> > On Mon, Oct 9, 2023 at 3:05 AM Akihiko Odaki  
> > wrote:
> >>
> >>
> >>
> >> On 2023/10/09 18:54, Willem de Bruijn wrote:
> >>> On Mon, Oct 9, 2023 at 3:44 AM Akihiko Odaki  
> >>> wrote:
> 
>  On 2023/10/09 17:13, Willem de Bruijn wrote:
> > On Sun, Oct 8, 2023 at 12:22 AM Akihiko Odaki 
> >  wrote:
> >>
> >> virtio-net have two usage of hashes: one is RSS and another is hash
> >> reporting. Conventionally the hash calculation was done by the VMM.
> >> However, computing the hash after the queue was chosen defeats the
> >> purpose of RSS.
> >>
> >> Another approach is to use eBPF steering program. This approach has
> >> another downside: it cannot report the calculated hash due to the
> >> restrictive nature of eBPF.
> >>
> >> Introduce the code to compute hashes to the kernel in order to overcome
> >> thse challenges. An alternative solution is to extend the eBPF steering
> >> program so that it will be able to report to the userspace, but it 
> >> makes
> >> little sense to allow to implement different hashing algorithms with
> >> eBPF since the hash value reported by virtio-net is strictly defined by
> >> the specification.
> >>
> >> The hash value already stored in sk_buff is not used and computed
> >> independently since it may have been computed in a way not conformant
> >> with the specification.
> >>
> >> Signed-off-by: Akihiko Odaki 
> >
> >> @@ -2116,31 +2172,49 @@ static ssize_t tun_put_user(struct tun_struct 
> >> *tun,
> >>}
> >>
> >>if (vnet_hdr_sz) {
> >> -   struct virtio_net_hdr gso;
> >> +   union {
> >> +   struct virtio_net_hdr hdr;
> >> +   struct virtio_net_hdr_v1_hash v1_hash_hdr;
> >> +   } hdr;
> >> +   int ret;
> >>
> >>if (iov_iter_count(iter) < vnet_hdr_sz)
> >>return -EINVAL;
> >>
> >> -   if (virtio_net_hdr_from_skb(skb, &gso,
> >> -   tun_is_little_endian(tun), 
> >> true,
> >> -   vlan_hlen)) {
> >> +   if ((READ_ONCE(tun->vnet_hash.flags) & 
> >> TUN_VNET_HASH_REPORT) &&
> >> +   vnet_hdr_sz >= sizeof(hdr.v1_hash_hdr) &&
> >> +   skb->tun_vnet_hash) {
> >
> > Isn't vnet_hdr_sz guaranteed to be >= hdr.v1_hash_hdr, by virtue of
> > the set hash ioctl failing otherwise?
> >
> > Such checks should be limited to control path where possible
> 
>  There is a potential race since tun->vnet_hash.flags and vnet_hdr_sz are
>  not read at once.
> >>>
> >>> It should not be possible to downgrade the hdr_sz once v1 is selected.
> >>
> >> I see nothing that prevents shrinking the header size.
> >>
> >> tun->vnet_hash.flags is read after vnet_hdr_sz so the race can happen
> >> even for the case the header size grows though this can be fixed by
> >> reordering the two reads.
> >
> > One option is to fail any control path that tries to re-negotiate
> > header size once this hash option is enabled?
> >
> > There is no practical reason to allow feature re-negotiation at any
> > arbitrary time.
>
> I think it's a bit awkward interface design since tun allows to
> reconfigure any of its parameters, but it's certainly possible.

If this would be the only exception to that rule, and this is the only
place that needs a datapath check, then it's fine to leave as is.

In general, this runtime configurability serves little purpose but to
help syzbot exercise code paths no real application would attempt. But
I won't ask to diverge from whatever tun already does. We just have to
be more careful about the possible races it brings.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vhost v2 1/2] virtio_pci: fix the common map size and add check for common size

On Mon, Oct 09, 2023 at 09:15:31AM +0800, Xuan Zhuo wrote:
> On Sun, 8 Oct 2023 06:42:37 -0400, "Michael S. Tsirkin"  
> wrote:
> > On Sun, Oct 08, 2023 at 05:38:41PM +0800, Xuan Zhuo wrote:
> > > Now, the function vp_modern_map_capability() takes the size parameter,
> > > which corresponds to the size of virtio_pci_common_cfg. As a result,
> > > this indicates the size of memory area to map.
> > >
> > > However, if the _F_RING_RESET is negotiated, the extra items will be
> > > used. Therefore, we need to use the size of virtio_pci_modern_common_cfg
> > > to map more space.
> > >
> > > Meanwhile, this patch removes the feature(_F_RING_ERSET and
> >
> > typo
> >
> > > _F_NOTIFICATION_DATA) when the common cfg size does not match
> > > the corresponding feature.
> > >
> > > Fixes: 0b50cece0b78 ("virtio_pci: introduce helper to get/set queue 
> > > reset")
> > > Signed-off-by: Xuan Zhuo 
> > > ---
> > >  drivers/virtio/virtio_pci_modern.c | 20 +++-
> > >  drivers/virtio/virtio_pci_modern_dev.c |  4 ++--
> > >  include/linux/virtio_pci_modern.h  |  1 +
> > >  3 files changed, 22 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/virtio/virtio_pci_modern.c 
> > > b/drivers/virtio/virtio_pci_modern.c
> > > index d6bb68ba84e5..c0b9d2363ddb 100644
> > > --- a/drivers/virtio/virtio_pci_modern.c
> > > +++ b/drivers/virtio/virtio_pci_modern.c
> > > @@ -22,8 +22,26 @@
> > >  static u64 vp_get_features(struct virtio_device *vdev)
> > >  {
> > >   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > + u64 features = vp_modern_get_features(&vp_dev->mdev);
> > > +
> > > +#define check_feature(feature, field)
> > > \
> > > + do {
> > > \
> > > + if (features & BIT_ULL(feature)) {  
> > > \
> > > + u32 offset = offsetofend(struct 
> > > virtio_pci_modern_common_cfg, field);   \
> > > + if (unlikely(vp_dev->mdev.common_len < offset)) 
> > > \
> > > + features &= ~BIT_ULL(feature);  
> > > \
> > > + }   
> > > \
> > > + } while (0)
> > > +
> > > + /* For buggy devices, if the common len does not match the feature, we
> > > +  * remove the feature.
> >
> > I don't like doing this in vp_get_features. userspace won't be able
> > to see the actual device features at all.
> > Also, we should print an info message at least.
> >
> > I am still debating with myself whether clearing feature bits
> > or just failing finalize_features (and thus, probe) is best.
> 
> For me, I think failing probe is best.
> 
> Then the developer of the device can find that firstly.
> And I think we should print an info message when we detect
> this error.

If you fail probe - maybe even a warning.

> If we clear the feature bits, the developer of the device may
> ignore this error.
> 
> >
> >
> > > +  */
> > > + check_feature(VIRTIO_F_NOTIFICATION_DATA, queue_notify_data);
> > > + check_feature(VIRTIO_F_RING_RESET, queue_reset);
> > > +
> > > +#undef check_feature
> >
> > this macro's too scary. just pass offset and feature bit as
> > parameters to an inline function.
> 
> I should pass the features as a parameter.
> 
> Thanks.
> 
> 
> 
> >
> > >
> > > - return vp_modern_get_features(&vp_dev->mdev);
> > > + return features;
> > >  }
> > >
> > >  static void vp_transport_features(struct virtio_device *vdev, u64 
> > > features)
> > > diff --git a/drivers/virtio/virtio_pci_modern_dev.c 
> > > b/drivers/virtio/virtio_pci_modern_dev.c
> > > index aad7d9296e77..33f319da1558 100644
> > > --- a/drivers/virtio/virtio_pci_modern_dev.c
> > > +++ b/drivers/virtio/virtio_pci_modern_dev.c
> > > @@ -291,8 +291,8 @@ int vp_modern_probe(struct virtio_pci_modern_device 
> > > *mdev)
> > >   err = -EINVAL;
> > >   mdev->common = vp_modern_map_capability(mdev, common,
> > > sizeof(struct virtio_pci_common_cfg), 4,
> > > -   0, sizeof(struct virtio_pci_common_cfg),
> > > -   NULL, NULL);
> > > +   0, sizeof(struct 
> > > virtio_pci_modern_common_cfg),
> > > +   &mdev->common_len, NULL);
> > >   if (!mdev->common)
> > >   goto err_map_common;
> > >   mdev->isr = vp_modern_map_capability(mdev, isr, sizeof(u8), 1,
> > > diff --git a/include/linux/virtio_pci_modern.h 
> > > b/include/linux/virtio_pci_modern.h
> > > index 067ac1d789bc..edf62bae0474 100644
> > > --- a/include/linux/virtio_pci_modern.h
> > > +++ b/include/linux/virtio_pci_modern.h
> > > @@ -28,6 +28,7 @@ struct virtio_pci_modern_device {
> > >   /* So we can sanity-check accesses. */
> > >   size_t notify_len;
> > >   size_t device_len;

Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature

On Mon, Oct 9, 2023 at 3:12 AM Akihiko Odaki  wrote:
>
> On 2023/10/09 19:06, Willem de Bruijn wrote:
> > On Mon, Oct 9, 2023 at 3:02 AM Akihiko Odaki  
> > wrote:
> >>
> >> On 2023/10/09 18:57, Willem de Bruijn wrote:
> >>> On Mon, Oct 9, 2023 at 3:57 AM Akihiko Odaki  
> >>> wrote:
> 
>  On 2023/10/09 17:04, Willem de Bruijn wrote:
> > On Sun, Oct 8, 2023 at 3:46 PM Akihiko Odaki  
> > wrote:
> >>
> >> On 2023/10/09 5:08, Willem de Bruijn wrote:
> >>> On Sun, Oct 8, 2023 at 10:04 PM Akihiko Odaki 
> >>>  wrote:
> 
>  On 2023/10/09 4:07, Willem de Bruijn wrote:
> > On Sun, Oct 8, 2023 at 7:22 AM Akihiko Odaki 
> >  wrote:
> >>
> >> virtio-net have two usage of hashes: one is RSS and another is hash
> >> reporting. Conventionally the hash calculation was done by the VMM.
> >> However, computing the hash after the queue was chosen defeats the
> >> purpose of RSS.
> >>
> >> Another approach is to use eBPF steering program. This approach has
> >> another downside: it cannot report the calculated hash due to the
> >> restrictive nature of eBPF.
> >>
> >> Introduce the code to compute hashes to the kernel in order to 
> >> overcome
> >> thse challenges. An alternative solution is to extend the eBPF 
> >> steering
> >> program so that it will be able to report to the userspace, but it 
> >> makes
> >> little sense to allow to implement different hashing algorithms 
> >> with
> >> eBPF since the hash value reported by virtio-net is strictly 
> >> defined by
> >> the specification.
> >>
> >> The hash value already stored in sk_buff is not used and computed
> >> independently since it may have been computed in a way not 
> >> conformant
> >> with the specification.
> >>
> >> Signed-off-by: Akihiko Odaki 
> >> ---
> >
> >> +static const struct tun_vnet_hash_cap tun_vnet_hash_cap = {
> >> +   .max_indirection_table_length =
> >> +   TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH,
> >> +
> >> +   .types = VIRTIO_NET_SUPPORTED_HASH_TYPES
> >> +};
> >
> > No need to have explicit capabilities exchange like this? Tun either
> > supports all or none.
> 
>  tun does not support VIRTIO_NET_RSS_HASH_TYPE_IP_EX,
>  VIRTIO_NET_RSS_HASH_TYPE_TCP_EX, and VIRTIO_NET_RSS_HASH_TYPE_UDP_EX.
> 
>  It is because the flow dissector does not support IPv6 extensions. 
>  The
>  specification is also vague, and does not tell how many TLVs should 
>  be
>  consumed at most when interpreting destination option header so I 
>  chose
>  to avoid adding code for these hash types to the flow dissector. I 
>  doubt
>  anyone will complain about it since nobody complains for Linux.
> 
>  I'm also adding this so that we can extend it later.
>  max_indirection_table_length may grow for systems with 128+ CPUs, or
>  types may have other bits for new protocols in the future.
> 
> >
> >>  case TUNSETSTEERINGEBPF:
> >> -   ret = tun_set_ebpf(tun, &tun->steering_prog, argp);
> >> +   bpf_ret = tun_set_ebpf(tun, &tun->steering_prog, 
> >> argp);
> >> +   if (IS_ERR(bpf_ret))
> >> +   ret = PTR_ERR(bpf_ret);
> >> +   else if (bpf_ret)
> >> +   tun->vnet_hash.flags &= ~TUN_VNET_HASH_RSS;
> >
> > Don't make one feature disable another.
> >
> > TUNSETSTEERINGEBPF and TUNSETVNETHASH are mutually exclusive
> > functions. If one is enabled the other call should fail, with EBUSY
> > for instance.
> >
> >> +   case TUNSETVNETHASH:
> >> +   len = sizeof(vnet_hash);
> >> +   if (copy_from_user(&vnet_hash, argp, len)) {
> >> +   ret = -EFAULT;
> >> +   break;
> >> +   }
> >> +
> >> +   if (((vnet_hash.flags & TUN_VNET_HASH_REPORT) &&
> >> +(tun->vnet_hdr_sz < sizeof(struct 
> >> virtio_net_hdr_v1_hash) ||
> >> + !tun_is_little_endian(tun))) ||
> >> +vnet_hash.indirection_table_mask >=
> >> +TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH) {
> >> +   ret = -EINVAL;
> >> +   break;
> >> +   }
> >> +
> >> +   argp = (u8 __user *)argp + len;
> >

Re: [PATCH 2/2] tools/virtio: Add hints when module is not installed

On Mon, Oct 09, 2023 at 02:44:55AM +, Liming Wu wrote:
> 
> 
> > -Original Message-
> > From: Jason Wang 
> > Sent: Sunday, October 8, 2023 12:36 PM
> > To: Liming Wu 
> > Cc: Michael S . Tsirkin ; k...@vger.kernel.org;
> > virtualization@lists.linux-foundation.org; net...@vger.kernel.org; linux-
> > ker...@vger.kernel.org; 398776...@qq.com
> > Subject: Re: [PATCH 2/2] tools/virtio: Add hints when module is not 
> > installed
> > 
> > On Tue, Sep 26, 2023 at 1:00 PM  wrote:
> > >
> > > From: Liming Wu 
> > >
> > > Need to insmod vhost_test.ko before run virtio_test.
> > > Give some hints to users.
> > >
> > > Signed-off-by: Liming Wu 
> > > ---
> > >  tools/virtio/virtio_test.c | 4 
> > >  1 file changed, 4 insertions(+)
> > >
> > > diff --git a/tools/virtio/virtio_test.c b/tools/virtio/virtio_test.c
> > > index 028f54e6854a..ce2c4d93d735 100644
> > > --- a/tools/virtio/virtio_test.c
> > > +++ b/tools/virtio/virtio_test.c
> > > @@ -135,6 +135,10 @@ static void vdev_info_init(struct vdev_info* dev,
> > unsigned long long features)
> > > dev->buf = malloc(dev->buf_size);
> > > assert(dev->buf);
> > > dev->control = open("/dev/vhost-test", O_RDWR);
> > > +
> > > +   if (dev->control < 0)
> > > +   fprintf(stderr, "Install vhost_test module" \
> > > +   "(./vhost_test/vhost_test.ko) firstly\n");
> > 
> > There should be many other reasons to fail for open().
> > 
> > Let's use strerror()?
> Yes,  Thanks for the review. 
> Please rechecked the code as follow:
> --- a/tools/virtio/virtio_test.c
> +++ b/tools/virtio/virtio_test.c
> @@ -135,6 +135,11 @@ static void vdev_info_init(struct vdev_info* dev, 
> unsigned long long features)
> dev->buf = malloc(dev->buf_size);
> assert(dev->buf);
> dev->control = open("/dev/vhost-test", O_RDWR);
> +
> +   if (dev->control == NULL)


???
Why are you comparing a file descriptor to NULL?

> +   fprintf(stderr,
> +   "%s: Check whether vhost_test.ko is installed.\n",
> +   strerror(errno));


No, do not suggest checking unconditionally this is just wasting user's
time.  You would have to check the exact errno value. You will either
get ENOENT or ENODEV if module is not loaded. Other errors indicate
other problems.  And what matters is whether it's loaded, not installed
- vhost_test.ko will not get auto-loaded even if installed.


> assert(dev->control >= 0);
> r = ioctl(dev->control, VHOST_SET_OWNER, NULL);
> assert(r >= 0);
>  
> Thanks
> 

In short, I am not applying this patch. If you really want to make
things a bit easier in case of errors, replace all assert r >= 0 with
a macro that prints out strerror(errno), that should be enough.
Maybe print file/line number too while we are at it.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH vhost v3 00/16] vdpa: Add support for vq descriptor mappings

2023-10-09 Thread Dragos Tatulea via Virtualization

This patch series adds support for vq descriptor table mappings which
are used to improve vdpa live migration downtime. The improvement comes
from using smaller mappings which take less time to create and destroy
in hw.

The first part adds the vdpa core changes from Si-Wei [0].

The second part adds support in mlx5_vdpa:
- Refactor the mr code to be able to cleanly add descriptor mappings.
- Add hardware descriptor mr support.
- Properly update iotlb for cvq during ASID switch.

Changes in v3:

- dup_iotlb now checks for src == dst case and returns an error.
- Renamed iotlb parameter in dup_iotlb to dst.
- Removed a redundant check of the asid value.
- Fixed a commit message.
- mx5_ifc.h patch has been applied to mlx5-vhost tree. When applying
this series please pull from that tree first.

Changes in v2:

- The "vdpa/mlx5: Enable hw support for vq descriptor mapping" change
was split off into two patches to avoid merge conflicts into the tree
of Linus.

The first patch contains only changes for mlx5_ifc.h. This must be
applied into the mlx5-vdpa tree [1] first. Once this patch is applied
on mlx5-vdpa, the change has to be pulled fom mlx5-vdpa into the vhost
tree and only then the remaining patches can be applied.

[0]
https://lore.kernel.org/virtualization/1694248959-13369-1-git-send-email-si-wei@oracle.com
[1]
https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-vhost

Dragos Tatulea (13):
vdpa/mlx5: Expose descriptor group mkey hw capability
vdpa/mlx5: Create helper function for dma mappings
vdpa/mlx5: Decouple cvq iotlb handling from hw mapping code
vdpa/mlx5: Take cvq iotlb lock during refresh
vdpa/mlx5: Collapse "dvq" mr add/delete functions
vdpa/mlx5: Rename mr destroy functions
vdpa/mlx5: Allow creation/deletion of any given mr struct
vdpa/mlx5: Move mr mutex out of mr struct
vdpa/mlx5: Improve mr update flow
vdpa/mlx5: Introduce mr for vq descriptor
vdpa/mlx5: Enable hw support for vq descriptor mapping
vdpa/mlx5: Make iotlb helper functions more generic
vdpa/mlx5: Update cvq iotlb mapping on ASID change

Si-Wei Liu (3):
vdpa: introduce dedicated descriptor group for virtqueue
vhost-vdpa: introduce descriptor group backend feature
vhost-vdpa: uAPI to get dedicated descriptor group id

--
2.41.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH mlx5-vhost v3 01/16] vdpa/mlx5: Expose descriptor group mkey hw capability

2023-10-09 Thread Dragos Tatulea via Virtualization

Necessary for improved live migration flow. Actual support will be added
in a downstream patch.

Reviewed-by: Gal Pressman 
Signed-off-by: Dragos Tatulea 
---
 include/linux/mlx5/mlx5_ifc.h | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index dd8421d021cf..ec15330b970d 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -1231,7 +1231,13 @@ struct mlx5_ifc_virtio_emulation_cap_bits {
u8 max_emulated_devices[0x8];
u8 max_num_virtio_queues[0x18];
 
-   u8 reserved_at_a0[0x60];
+   u8 reserved_at_a0[0x20];
+
+   u8 reserved_at_c0[0x13];
+   u8 desc_group_mkey_supported[0x1];
+   u8 reserved_at_d4[0xc];
+
+   u8 reserved_at_e0[0x20];
 
u8 umem_1_buffer_param_a[0x20];
 
-- 
2.41.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH vhost v3 03/16] vhost-vdpa: introduce descriptor group backend feature

2023-10-09 Thread Dragos Tatulea via Virtualization

From: Si-Wei Liu 

Userspace knows if the device has dedicated descriptor group or not
by checking this feature bit.

It's only exposed if the vdpa driver backend implements the
.get_vq_desc_group() operation callback. Userspace trying to negotiate
this feature when it or the dependent _F_IOTLB_ASID feature hasn't
been exposed will result in an error.

Signed-off-by: Si-Wei Liu 
Acked-by: Eugenio Pérez 
Acked-by: Jason Wang 
---
 drivers/vhost/vdpa.c | 17 +
 include/uapi/linux/vhost_types.h |  5 +
 2 files changed, 22 insertions(+)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 78379ffd2336..2f21798a37ee 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -389,6 +389,14 @@ static bool vhost_vdpa_can_resume(const struct vhost_vdpa 
*v)
return ops->resume;
 }
 
+static bool vhost_vdpa_has_desc_group(const struct vhost_vdpa *v)
+{
+   struct vdpa_device *vdpa = v->vdpa;
+   const struct vdpa_config_ops *ops = vdpa->config;
+
+   return ops->get_vq_desc_group;
+}
+
 static long vhost_vdpa_get_features(struct vhost_vdpa *v, u64 __user *featurep)
 {
struct vdpa_device *vdpa = v->vdpa;
@@ -690,6 +698,7 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
if (copy_from_user(&features, featurep, sizeof(features)))
return -EFAULT;
if (features & ~(VHOST_VDPA_BACKEND_FEATURES |
+BIT_ULL(VHOST_BACKEND_F_DESC_ASID) |
 BIT_ULL(VHOST_BACKEND_F_SUSPEND) |
 BIT_ULL(VHOST_BACKEND_F_RESUME) |
 
BIT_ULL(VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK)))
@@ -700,6 +709,12 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
if ((features & BIT_ULL(VHOST_BACKEND_F_RESUME)) &&
 !vhost_vdpa_can_resume(v))
return -EOPNOTSUPP;
+   if ((features & BIT_ULL(VHOST_BACKEND_F_DESC_ASID)) &&
+   !(features & BIT_ULL(VHOST_BACKEND_F_IOTLB_ASID)))
+   return -EINVAL;
+   if ((features & BIT_ULL(VHOST_BACKEND_F_DESC_ASID)) &&
+!vhost_vdpa_has_desc_group(v))
+   return -EOPNOTSUPP;
vhost_set_backend_features(&v->vdev, features);
return 0;
}
@@ -753,6 +768,8 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
features |= BIT_ULL(VHOST_BACKEND_F_SUSPEND);
if (vhost_vdpa_can_resume(v))
features |= BIT_ULL(VHOST_BACKEND_F_RESUME);
+   if (vhost_vdpa_has_desc_group(v))
+   features |= BIT_ULL(VHOST_BACKEND_F_DESC_ASID);
features |= vhost_vdpa_get_backend_features(v);
if (copy_to_user(featurep, &features, sizeof(features)))
r = -EFAULT;
diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
index 2d827d22cd99..18ad6ae7ab5c 100644
--- a/include/uapi/linux/vhost_types.h
+++ b/include/uapi/linux/vhost_types.h
@@ -185,5 +185,10 @@ struct vhost_vdpa_iova_range {
  * DRIVER_OK
  */
 #define VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK  0x6
+/* Device may expose the virtqueue's descriptor area, driver area and
+ * device area to a different group for ASID binding than where its
+ * buffers may reside. Requires VHOST_BACKEND_F_IOTLB_ASID.
+ */
+#define VHOST_BACKEND_F_DESC_ASID0x7
 
 #endif
-- 
2.41.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH vhost v3 02/16] vdpa: introduce dedicated descriptor group for virtqueue

2023-10-09 Thread Dragos Tatulea via Virtualization

From: Si-Wei Liu 

In some cases, the access to the virtqueue's descriptor area, device
and driver areas (precluding indirect descriptor table in guest memory)
may have to be confined to a different address space than where its
buffers reside. Without loss of simplicity and generality with already
established terminology, let's fold up these 3 areas and call them
as a whole as descriptor table group, or descriptor group for short.
Specifically, in case of split virtqueues, descriptor group consists of
regions for Descriptor Table, Available Ring and Used Ring; for packed
virtqueues layout, descriptor group contains Descriptor Ring, Driver
and Device Event Suppression structures.

The group ID for a dedicated descriptor group can be obtained through a
new .get_vq_desc_group() op. If driver implements this op, it means that
the descriptor, device and driver areas of the virtqueue may reside
in a dedicated group than where its buffers reside, a.k.a the default
virtqueue group through the .get_vq_group() op.

In principle, the descriptor group may or may not have same group ID
as the default group. Even if the descriptor group has a different ID,
meaning the vq's descriptor group areas can optionally move to a
separate address space than where guest memory resides, the descriptor
group may still start from a default address space, same as where its
buffers reside. To move the descriptor group to a different address
space, .set_group_asid() has to be called to change the ASID binding
for the group, which is no different than what needs to be done on any
other virtqueue group. On the other hand, the .reset() semantics also
applies on descriptor table group, meaning the device reset will clear
all ASID bindings and move all virtqueue groups including descriptor
group back to the default address space, i.e. in ASID 0.

QEMU's shadow virtqueue is going to utilize dedicated descriptor group
to speed up map and unmap operations, yielding tremendous downtime
reduction by avoiding the full and slow remap cycle in SVQ switching.

Signed-off-by: Si-Wei Liu 
Acked-by: Eugenio Pérez 
Acked-by: Jason Wang 
---
 include/linux/vdpa.h | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 0e652026b776..d376309b99cf 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -204,6 +204,16 @@ struct vdpa_map_file {
  * @vdev: vdpa device
  * @idx: virtqueue index
  * Returns u32: group id for this virtqueue
+ * @get_vq_desc_group: Get the group id for the descriptor table of
+ * a specific virtqueue (optional)
+ * @vdev: vdpa device
+ * @idx: virtqueue index
+ * Returns u32: group id for the descriptor table
+ * portion of this virtqueue. Could be different
+ * than the one from @get_vq_group, in which case
+ * the access to the descriptor table can be
+ * confined to a separate asid, isolating from
+ * the virtqueue's buffer address access.
  * @get_device_features:   Get virtio features supported by the device
  * @vdev: vdpa device
  * Returns the virtio features support by the
@@ -360,6 +370,7 @@ struct vdpa_config_ops {
/* Device ops */
u32 (*get_vq_align)(struct vdpa_device *vdev);
u32 (*get_vq_group)(struct vdpa_device *vdev, u16 idx);
+   u32 (*get_vq_desc_group)(struct vdpa_device *vdev, u16 idx);
u64 (*get_device_features)(struct vdpa_device *vdev);
u64 (*get_backend_features)(const struct vdpa_device *vdev);
int (*set_driver_features)(struct vdpa_device *vdev, u64 features);
-- 
2.41.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH vhost v3 04/16] vhost-vdpa: uAPI to get dedicated descriptor group id

2023-10-09 Thread Dragos Tatulea via Virtualization

From: Si-Wei Liu 

With _F_DESC_ASID backend feature, the device can now support the
VHOST_VDPA_GET_VRING_DESC_GROUP ioctl, and it may expose the descriptor
table (including avail and used ring) in a different group than the
buffers it contains. This new uAPI will fetch the group ID of the
descriptor table.

Signed-off-by: Si-Wei Liu 
Acked-by: Eugenio Pérez 
Acked-by: Jason Wang 
---
 drivers/vhost/vdpa.c   | 10 ++
 include/uapi/linux/vhost.h |  8 
 2 files changed, 18 insertions(+)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 2f21798a37ee..851535f57b95 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -613,6 +613,16 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, 
unsigned int cmd,
else if (copy_to_user(argp, &s, sizeof(s)))
return -EFAULT;
return 0;
+   case VHOST_VDPA_GET_VRING_DESC_GROUP:
+   if (!vhost_vdpa_has_desc_group(v))
+   return -EOPNOTSUPP;
+   s.index = idx;
+   s.num = ops->get_vq_desc_group(vdpa, idx);
+   if (s.num >= vdpa->ngroups)
+   return -EIO;
+   else if (copy_to_user(argp, &s, sizeof(s)))
+   return -EFAULT;
+   return 0;
case VHOST_VDPA_SET_GROUP_ASID:
if (copy_from_user(&s, argp, sizeof(s)))
return -EFAULT;
diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
index f5c48b61ab62..649560c685f1 100644
--- a/include/uapi/linux/vhost.h
+++ b/include/uapi/linux/vhost.h
@@ -219,4 +219,12 @@
  */
 #define VHOST_VDPA_RESUME  _IO(VHOST_VIRTIO, 0x7E)
 
+/* Get the group for the descriptor table including driver & device areas
+ * of a virtqueue: read index, write group in num.
+ * The virtqueue index is stored in the index field of vhost_vring_state.
+ * The group ID of the descriptor table for this specific virtqueue
+ * is returned via num field of vhost_vring_state.
+ */
+#define VHOST_VDPA_GET_VRING_DESC_GROUP_IOWR(VHOST_VIRTIO, 0x7F,   
\
+ struct vhost_vring_state)
 #endif
-- 
2.41.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH vhost v3 06/16] vdpa/mlx5: Decouple cvq iotlb handling from hw mapping code

2023-10-09 Thread Dragos Tatulea via Virtualization

The handling of the cvq iotlb is currently coupled with the creation
and destruction of the hardware mkeys (mr).

This patch moves cvq iotlb handling into its own function and shifts it
to a scope that is not related to mr handling. As cvq handling is just a
prune_iotlb + dup_iotlb cycle, put it all in the same "update" function.
Finally, the destruction path is handled by directly pruning the iotlb.

After this move is done the ASID mr code can be collapsed into a single
function.

Acked-by: Jason Wang 
Acked-by: Eugenio Pérez 
Signed-off-by: Dragos Tatulea 
---
 drivers/vdpa/mlx5/core/mlx5_vdpa.h |  3 ++
 drivers/vdpa/mlx5/core/mr.c| 57 +++---
 drivers/vdpa/mlx5/net/mlx5_vnet.c  |  7 ++--
 3 files changed, 28 insertions(+), 39 deletions(-)

diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h 
b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
index 3748f027cfe9..554899a80241 100644
--- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
@@ -120,6 +120,9 @@ int mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, struct 
vhost_iotlb *iotlb,
unsigned int asid);
 void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev);
 void mlx5_vdpa_destroy_mr_asid(struct mlx5_vdpa_dev *mvdev, unsigned int asid);
+int mlx5_vdpa_update_cvq_iotlb(struct mlx5_vdpa_dev *mvdev,
+   struct vhost_iotlb *iotlb,
+   unsigned int asid);
 int mlx5_vdpa_create_dma_mr(struct mlx5_vdpa_dev *mvdev);
 
 #define mlx5_vdpa_warn(__dev, format, ...) 
\
diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
index 7bd0883b8b25..fcb6ae32e9ed 100644
--- a/drivers/vdpa/mlx5/core/mr.c
+++ b/drivers/vdpa/mlx5/core/mr.c
@@ -489,14 +489,6 @@ static void destroy_user_mr(struct mlx5_vdpa_dev *mvdev, 
struct mlx5_vdpa_mr *mr
}
 }
 
-static void _mlx5_vdpa_destroy_cvq_mr(struct mlx5_vdpa_dev *mvdev, unsigned 
int asid)
-{
-   if (mvdev->group2asid[MLX5_VDPA_CVQ_GROUP] != asid)
-   return;
-
-   prune_iotlb(mvdev);
-}
-
 static void _mlx5_vdpa_destroy_dvq_mr(struct mlx5_vdpa_dev *mvdev, unsigned 
int asid)
 {
struct mlx5_vdpa_mr *mr = &mvdev->mr;
@@ -522,25 +514,14 @@ void mlx5_vdpa_destroy_mr_asid(struct mlx5_vdpa_dev 
*mvdev, unsigned int asid)
mutex_lock(&mr->mkey_mtx);
 
_mlx5_vdpa_destroy_dvq_mr(mvdev, asid);
-   _mlx5_vdpa_destroy_cvq_mr(mvdev, asid);
 
mutex_unlock(&mr->mkey_mtx);
 }
 
 void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev)
 {
-   mlx5_vdpa_destroy_mr_asid(mvdev, 
mvdev->group2asid[MLX5_VDPA_CVQ_GROUP]);
mlx5_vdpa_destroy_mr_asid(mvdev, 
mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP]);
-}
-
-static int _mlx5_vdpa_create_cvq_mr(struct mlx5_vdpa_dev *mvdev,
-   struct vhost_iotlb *iotlb,
-   unsigned int asid)
-{
-   if (mvdev->group2asid[MLX5_VDPA_CVQ_GROUP] != asid)
-   return 0;
-
-   return dup_iotlb(mvdev, iotlb);
+   prune_iotlb(mvdev);
 }
 
 static int _mlx5_vdpa_create_dvq_mr(struct mlx5_vdpa_dev *mvdev,
@@ -572,22 +553,7 @@ static int _mlx5_vdpa_create_dvq_mr(struct mlx5_vdpa_dev 
*mvdev,
 static int _mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev,
struct vhost_iotlb *iotlb, unsigned int asid)
 {
-   int err;
-
-   err = _mlx5_vdpa_create_dvq_mr(mvdev, iotlb, asid);
-   if (err)
-   return err;
-
-   err = _mlx5_vdpa_create_cvq_mr(mvdev, iotlb, asid);
-   if (err)
-   goto out_err;
-
-   return 0;
-
-out_err:
-   _mlx5_vdpa_destroy_dvq_mr(mvdev, asid);
-
-   return err;
+   return _mlx5_vdpa_create_dvq_mr(mvdev, iotlb, asid);
 }
 
 int mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb,
@@ -620,7 +586,24 @@ int mlx5_vdpa_handle_set_map(struct mlx5_vdpa_dev *mvdev, 
struct vhost_iotlb *io
return err;
 }
 
+int mlx5_vdpa_update_cvq_iotlb(struct mlx5_vdpa_dev *mvdev,
+   struct vhost_iotlb *iotlb,
+   unsigned int asid)
+{
+   if (mvdev->group2asid[MLX5_VDPA_CVQ_GROUP] != asid)
+   return 0;
+
+   prune_iotlb(mvdev);
+   return dup_iotlb(mvdev, iotlb);
+}
+
 int mlx5_vdpa_create_dma_mr(struct mlx5_vdpa_dev *mvdev)
 {
-   return mlx5_vdpa_create_mr(mvdev, NULL, 0);
+   int err;
+
+   err = mlx5_vdpa_create_mr(mvdev, NULL, 0);
+   if (err)
+   return err;
+
+   return mlx5_vdpa_update_cvq_iotlb(mvdev, NULL, 0);
 }
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 65b6a54ad344..aa4896662699 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -2884,10 +2884,13 @@ static int set_map_data(struct mlx5_vdpa_dev *mvdev, 
struct vhost_iotlb *iotlb,
return err;
}
 
-

[PATCH vhost v3 07/16] vdpa/mlx5: Take cvq iotlb lock during refresh

2023-10-09 Thread Dragos Tatulea via Virtualization

The reslock is taken while refresh is called but iommu_lock is more
specific to this resource. So take the iommu_lock during cvq iotlb
refresh.

Based on Eugenio's patch [0].

[0] https://lore.kernel.org/lkml/20230112142218.725622-4-epere...@redhat.com/

Acked-by: Jason Wang 
Suggested-by: Eugenio Pérez 
Reviewed-by: Eugenio Pérez 
Signed-off-by: Dragos Tatulea 
---
 drivers/vdpa/mlx5/core/mr.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
index fcb6ae32e9ed..587300e7c18e 100644
--- a/drivers/vdpa/mlx5/core/mr.c
+++ b/drivers/vdpa/mlx5/core/mr.c
@@ -590,11 +590,19 @@ int mlx5_vdpa_update_cvq_iotlb(struct mlx5_vdpa_dev 
*mvdev,
struct vhost_iotlb *iotlb,
unsigned int asid)
 {
+   int err;
+
if (mvdev->group2asid[MLX5_VDPA_CVQ_GROUP] != asid)
return 0;
 
+   spin_lock(&mvdev->cvq.iommu_lock);
+
prune_iotlb(mvdev);
-   return dup_iotlb(mvdev, iotlb);
+   err = dup_iotlb(mvdev, iotlb);
+
+   spin_unlock(&mvdev->cvq.iommu_lock);
+
+   return err;
 }
 
 int mlx5_vdpa_create_dma_mr(struct mlx5_vdpa_dev *mvdev)
-- 
2.41.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH vhost v3 05/16] vdpa/mlx5: Create helper function for dma mappings

2023-10-09 Thread Dragos Tatulea via Virtualization

Necessary for upcoming cvq separation from mr allocation.

Acked-by: Jason Wang 
Signed-off-by: Dragos Tatulea 
---
 drivers/vdpa/mlx5/core/mlx5_vdpa.h | 1 +
 drivers/vdpa/mlx5/core/mr.c| 5 +
 drivers/vdpa/mlx5/net/mlx5_vnet.c  | 4 ++--
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h 
b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
index ca56242972b3..3748f027cfe9 100644
--- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
@@ -120,6 +120,7 @@ int mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, struct 
vhost_iotlb *iotlb,
unsigned int asid);
 void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev);
 void mlx5_vdpa_destroy_mr_asid(struct mlx5_vdpa_dev *mvdev, unsigned int asid);
+int mlx5_vdpa_create_dma_mr(struct mlx5_vdpa_dev *mvdev);
 
 #define mlx5_vdpa_warn(__dev, format, ...) 
\
dev_warn((__dev)->mdev->device, "%s:%d:(pid %d) warning: " format, 
__func__, __LINE__, \
diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
index 5a1971fcd87b..7bd0883b8b25 100644
--- a/drivers/vdpa/mlx5/core/mr.c
+++ b/drivers/vdpa/mlx5/core/mr.c
@@ -619,3 +619,8 @@ int mlx5_vdpa_handle_set_map(struct mlx5_vdpa_dev *mvdev, 
struct vhost_iotlb *io
 
return err;
 }
+
+int mlx5_vdpa_create_dma_mr(struct mlx5_vdpa_dev *mvdev)
+{
+   return mlx5_vdpa_create_mr(mvdev, NULL, 0);
+}
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 40a03b08d7cf..65b6a54ad344 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -2836,7 +2836,7 @@ static int mlx5_vdpa_reset(struct vdpa_device *vdev)
++mvdev->generation;
 
if (MLX5_CAP_GEN(mvdev->mdev, umem_uid_0)) {
-   if (mlx5_vdpa_create_mr(mvdev, NULL, 0))
+   if (mlx5_vdpa_create_dma_mr(mvdev))
mlx5_vdpa_warn(mvdev, "create MR failed\n");
}
up_write(&ndev->reslock);
@@ -3441,7 +3441,7 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev 
*v_mdev, const char *name,
goto err_mpfs;
 
if (MLX5_CAP_GEN(mvdev->mdev, umem_uid_0)) {
-   err = mlx5_vdpa_create_mr(mvdev, NULL, 0);
+   err = mlx5_vdpa_create_dma_mr(mvdev);
if (err)
goto err_res;
}
-- 
2.41.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH vhost v3 09/16] vdpa/mlx5: Rename mr destroy functions

2023-10-09 Thread Dragos Tatulea via Virtualization

Make mlx5_destroy_mr symmetric to mlx5_create_mr.

Acked-by: Jason Wang 
Acked-by: Eugenio Pérez 
Signed-off-by: Dragos Tatulea 
---
 drivers/vdpa/mlx5/core/mlx5_vdpa.h |  4 ++--
 drivers/vdpa/mlx5/core/mr.c|  6 +++---
 drivers/vdpa/mlx5/net/mlx5_vnet.c  | 12 ++--
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h 
b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
index 554899a80241..e1e6e7aba50e 100644
--- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
@@ -118,8 +118,8 @@ int mlx5_vdpa_handle_set_map(struct mlx5_vdpa_dev *mvdev, 
struct vhost_iotlb *io
 bool *change_map, unsigned int asid);
 int mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb,
unsigned int asid);
-void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev);
-void mlx5_vdpa_destroy_mr_asid(struct mlx5_vdpa_dev *mvdev, unsigned int asid);
+void mlx5_vdpa_destroy_mr_resources(struct mlx5_vdpa_dev *mvdev);
+void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev, unsigned int asid);
 int mlx5_vdpa_update_cvq_iotlb(struct mlx5_vdpa_dev *mvdev,
struct vhost_iotlb *iotlb,
unsigned int asid);
diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
index fde00497f4ad..00dcce190a1f 100644
--- a/drivers/vdpa/mlx5/core/mr.c
+++ b/drivers/vdpa/mlx5/core/mr.c
@@ -507,7 +507,7 @@ static void _mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev 
*mvdev, unsigned int asid
mr->initialized = false;
 }
 
-void mlx5_vdpa_destroy_mr_asid(struct mlx5_vdpa_dev *mvdev, unsigned int asid)
+void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev, unsigned int asid)
 {
struct mlx5_vdpa_mr *mr = &mvdev->mr;
 
@@ -518,9 +518,9 @@ void mlx5_vdpa_destroy_mr_asid(struct mlx5_vdpa_dev *mvdev, 
unsigned int asid)
mutex_unlock(&mr->mkey_mtx);
 }
 
-void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev)
+void mlx5_vdpa_destroy_mr_resources(struct mlx5_vdpa_dev *mvdev)
 {
-   mlx5_vdpa_destroy_mr_asid(mvdev, 
mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP]);
+   mlx5_vdpa_destroy_mr(mvdev, mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP]);
prune_iotlb(mvdev);
 }
 
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index aa4896662699..ab196c43694c 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -2644,7 +2644,7 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev 
*mvdev,
goto err_mr;
 
teardown_driver(ndev);
-   mlx5_vdpa_destroy_mr_asid(mvdev, asid);
+   mlx5_vdpa_destroy_mr(mvdev, asid);
err = mlx5_vdpa_create_mr(mvdev, iotlb, asid);
if (err)
goto err_mr;
@@ -2660,7 +2660,7 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev 
*mvdev,
return 0;
 
 err_setup:
-   mlx5_vdpa_destroy_mr_asid(mvdev, asid);
+   mlx5_vdpa_destroy_mr(mvdev, asid);
 err_mr:
return err;
 }
@@ -2797,7 +2797,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device 
*vdev, u8 status)
 err_driver:
unregister_link_notifier(ndev);
 err_setup:
-   mlx5_vdpa_destroy_mr(&ndev->mvdev);
+   mlx5_vdpa_destroy_mr_resources(&ndev->mvdev);
ndev->mvdev.status |= VIRTIO_CONFIG_S_FAILED;
 err_clear:
up_write(&ndev->reslock);
@@ -2824,7 +2824,7 @@ static int mlx5_vdpa_reset(struct vdpa_device *vdev)
unregister_link_notifier(ndev);
teardown_driver(ndev);
clear_vqs_ready(ndev);
-   mlx5_vdpa_destroy_mr(&ndev->mvdev);
+   mlx5_vdpa_destroy_mr_resources(&ndev->mvdev);
ndev->mvdev.status = 0;
ndev->mvdev.suspended = false;
ndev->cur_num_vqs = 0;
@@ -2944,7 +2944,7 @@ static void mlx5_vdpa_free(struct vdpa_device *vdev)
ndev = to_mlx5_vdpa_ndev(mvdev);
 
free_resources(ndev);
-   mlx5_vdpa_destroy_mr(mvdev);
+   mlx5_vdpa_destroy_mr_resources(mvdev);
if (!is_zero_ether_addr(ndev->config.mac)) {
pfmdev = pci_get_drvdata(pci_physfn(mvdev->mdev->pdev));
mlx5_mpfs_del_mac(pfmdev, ndev->config.mac);
@@ -3474,7 +3474,7 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev 
*v_mdev, const char *name,
 err_res2:
free_resources(ndev);
 err_mr:
-   mlx5_vdpa_destroy_mr(mvdev);
+   mlx5_vdpa_destroy_mr_resources(mvdev);
 err_res:
mlx5_vdpa_free_resources(&ndev->mvdev);
 err_mpfs:
-- 
2.41.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH vhost v3 08/16] vdpa/mlx5: Collapse "dvq" mr add/delete functions

2023-10-09 Thread Dragos Tatulea via Virtualization

Now that the cvq code is out of mlx5_vdpa_create/destroy_mr, the "dvq"
functions can be folded into their callers.

Having "dvq" in the naming will no longer be accurate in the downstream
patches.

Acked-by: Jason Wang 
Acked-by: Eugenio Pérez 
Signed-off-by: Dragos Tatulea 
---
 drivers/vdpa/mlx5/core/mr.c | 16 +---
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
index 587300e7c18e..fde00497f4ad 100644
--- a/drivers/vdpa/mlx5/core/mr.c
+++ b/drivers/vdpa/mlx5/core/mr.c
@@ -489,7 +489,7 @@ static void destroy_user_mr(struct mlx5_vdpa_dev *mvdev, 
struct mlx5_vdpa_mr *mr
}
 }
 
-static void _mlx5_vdpa_destroy_dvq_mr(struct mlx5_vdpa_dev *mvdev, unsigned 
int asid)
+static void _mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev, unsigned int 
asid)
 {
struct mlx5_vdpa_mr *mr = &mvdev->mr;
 
@@ -513,7 +513,7 @@ void mlx5_vdpa_destroy_mr_asid(struct mlx5_vdpa_dev *mvdev, 
unsigned int asid)
 
mutex_lock(&mr->mkey_mtx);
 
-   _mlx5_vdpa_destroy_dvq_mr(mvdev, asid);
+   _mlx5_vdpa_destroy_mr(mvdev, asid);
 
mutex_unlock(&mr->mkey_mtx);
 }
@@ -524,9 +524,9 @@ void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev)
prune_iotlb(mvdev);
 }
 
-static int _mlx5_vdpa_create_dvq_mr(struct mlx5_vdpa_dev *mvdev,
-   struct vhost_iotlb *iotlb,
-   unsigned int asid)
+static int _mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev,
+   struct vhost_iotlb *iotlb,
+   unsigned int asid)
 {
struct mlx5_vdpa_mr *mr = &mvdev->mr;
int err;
@@ -550,12 +550,6 @@ static int _mlx5_vdpa_create_dvq_mr(struct mlx5_vdpa_dev 
*mvdev,
return 0;
 }
 
-static int _mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev,
-   struct vhost_iotlb *iotlb, unsigned int asid)
-{
-   return _mlx5_vdpa_create_dvq_mr(mvdev, iotlb, asid);
-}
-
 int mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb,
unsigned int asid)
 {
-- 
2.41.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH vhost v3 11/16] vdpa/mlx5: Move mr mutex out of mr struct

2023-10-09 Thread Dragos Tatulea via Virtualization

The mutex is named like it is supposed to protect only the mkey but in
reality it is a global lock for all mr resources.

Shift the mutex to it's rightful location (struct mlx5_vdpa_dev) and
give it a more appropriate name.

Signed-off-by: Dragos Tatulea 
Acked-by: Eugenio Pérez 
---
 drivers/vdpa/mlx5/core/mlx5_vdpa.h |  4 ++--
 drivers/vdpa/mlx5/core/mr.c| 13 +++--
 drivers/vdpa/mlx5/core/resources.c |  6 +++---
 3 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h 
b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
index 01d4ee58ccb1..9c6ac42c21e1 100644
--- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
@@ -34,8 +34,6 @@ struct mlx5_vdpa_mr {
/* state of dvq mr */
bool initialized;
 
-   /* serialize mkey creation and destruction */
-   struct mutex mkey_mtx;
bool user_mr;
 };
 
@@ -94,6 +92,8 @@ struct mlx5_vdpa_dev {
u32 generation;
 
struct mlx5_vdpa_mr mr;
+   /* serialize mr access */
+   struct mutex mr_mtx;
struct mlx5_control_vq cvq;
struct workqueue_struct *wq;
unsigned int group2asid[MLX5_VDPA_NUMVQ_GROUPS];
diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
index 6f29e8eaabb1..abd6a6fb122f 100644
--- a/drivers/vdpa/mlx5/core/mr.c
+++ b/drivers/vdpa/mlx5/core/mr.c
@@ -509,11 +509,11 @@ static void _mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev 
*mvdev, struct mlx5_vdpa_
 void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev,
  struct mlx5_vdpa_mr *mr)
 {
-   mutex_lock(&mr->mkey_mtx);
+   mutex_lock(&mvdev->mr_mtx);
 
_mlx5_vdpa_destroy_mr(mvdev, mr);
 
-   mutex_unlock(&mr->mkey_mtx);
+   mutex_unlock(&mvdev->mr_mtx);
 }
 
 void mlx5_vdpa_destroy_mr_resources(struct mlx5_vdpa_dev *mvdev)
@@ -550,9 +550,10 @@ int mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev,
 {
int err;
 
-   mutex_lock(&mvdev->mr.mkey_mtx);
+   mutex_lock(&mvdev->mr_mtx);
err = _mlx5_vdpa_create_mr(mvdev, mr, iotlb);
-   mutex_unlock(&mvdev->mr.mkey_mtx);
+   mutex_unlock(&mvdev->mr_mtx);
+
return err;
 }
 
@@ -563,14 +564,14 @@ int mlx5_vdpa_handle_set_map(struct mlx5_vdpa_dev *mvdev, 
struct vhost_iotlb *io
int err = 0;
 
*change_map = false;
-   mutex_lock(&mr->mkey_mtx);
+   mutex_lock(&mvdev->mr_mtx);
if (mr->initialized) {
mlx5_vdpa_info(mvdev, "memory map update\n");
*change_map = true;
}
if (!*change_map)
err = _mlx5_vdpa_create_mr(mvdev, mr, iotlb);
-   mutex_unlock(&mr->mkey_mtx);
+   mutex_unlock(&mvdev->mr_mtx);
 
return err;
 }
diff --git a/drivers/vdpa/mlx5/core/resources.c 
b/drivers/vdpa/mlx5/core/resources.c
index d5a59c9035fb..5c5a41b64bfc 100644
--- a/drivers/vdpa/mlx5/core/resources.c
+++ b/drivers/vdpa/mlx5/core/resources.c
@@ -256,7 +256,7 @@ int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev)
mlx5_vdpa_warn(mvdev, "resources already allocated\n");
return -EINVAL;
}
-   mutex_init(&mvdev->mr.mkey_mtx);
+   mutex_init(&mvdev->mr_mtx);
res->uar = mlx5_get_uars_page(mdev);
if (IS_ERR(res->uar)) {
err = PTR_ERR(res->uar);
@@ -301,7 +301,7 @@ int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev)
 err_uctx:
mlx5_put_uars_page(mdev, res->uar);
 err_uars:
-   mutex_destroy(&mvdev->mr.mkey_mtx);
+   mutex_destroy(&mvdev->mr_mtx);
return err;
 }
 
@@ -318,6 +318,6 @@ void mlx5_vdpa_free_resources(struct mlx5_vdpa_dev *mvdev)
dealloc_pd(mvdev, res->pdn, res->uid);
destroy_uctx(mvdev, res->uid);
mlx5_put_uars_page(mvdev->mdev, res->uar);
-   mutex_destroy(&mvdev->mr.mkey_mtx);
+   mutex_destroy(&mvdev->mr_mtx);
res->valid = false;
 }
-- 
2.41.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH vhost v3 10/16] vdpa/mlx5: Allow creation/deletion of any given mr struct

2023-10-09 Thread Dragos Tatulea via Virtualization

This patch adapts the mr creation/deletion code to be able to work with
any given mr struct pointer. All the APIs are adapted to take an extra
parameter for the mr.

mlx5_vdpa_create/delete_mr doesn't need a ASID parameter anymore. The
check is done in the caller instead (mlx5_set_map).

This change is needed for a followup patch which will introduce an
additional mr for the vq descriptor data.

Signed-off-by: Dragos Tatulea 
Acked-by: Eugenio Pérez 
---
 drivers/vdpa/mlx5/core/mlx5_vdpa.h |  8 +++--
 drivers/vdpa/mlx5/core/mr.c| 53 ++
 drivers/vdpa/mlx5/net/mlx5_vnet.c  | 10 --
 3 files changed, 36 insertions(+), 35 deletions(-)

diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h 
b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
index e1e6e7aba50e..01d4ee58ccb1 100644
--- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
@@ -116,10 +116,12 @@ int mlx5_vdpa_create_mkey(struct mlx5_vdpa_dev *mvdev, 
u32 *mkey, u32 *in,
 int mlx5_vdpa_destroy_mkey(struct mlx5_vdpa_dev *mvdev, u32 mkey);
 int mlx5_vdpa_handle_set_map(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb 
*iotlb,
 bool *change_map, unsigned int asid);
-int mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb,
-   unsigned int asid);
+int mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev,
+   struct mlx5_vdpa_mr *mr,
+   struct vhost_iotlb *iotlb);
 void mlx5_vdpa_destroy_mr_resources(struct mlx5_vdpa_dev *mvdev);
-void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev, unsigned int asid);
+void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev,
+ struct mlx5_vdpa_mr *mr);
 int mlx5_vdpa_update_cvq_iotlb(struct mlx5_vdpa_dev *mvdev,
struct vhost_iotlb *iotlb,
unsigned int asid);
diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
index 00dcce190a1f..6f29e8eaabb1 100644
--- a/drivers/vdpa/mlx5/core/mr.c
+++ b/drivers/vdpa/mlx5/core/mr.c
@@ -301,10 +301,13 @@ static void unmap_direct_mr(struct mlx5_vdpa_dev *mvdev, 
struct mlx5_vdpa_direct
sg_free_table(&mr->sg_head);
 }
 
-static int add_direct_chain(struct mlx5_vdpa_dev *mvdev, u64 start, u64 size, 
u8 perm,
+static int add_direct_chain(struct mlx5_vdpa_dev *mvdev,
+   struct mlx5_vdpa_mr *mr,
+   u64 start,
+   u64 size,
+   u8 perm,
struct vhost_iotlb *iotlb)
 {
-   struct mlx5_vdpa_mr *mr = &mvdev->mr;
struct mlx5_vdpa_direct_mr *dmr;
struct mlx5_vdpa_direct_mr *n;
LIST_HEAD(tmp);
@@ -354,9 +357,10 @@ static int add_direct_chain(struct mlx5_vdpa_dev *mvdev, 
u64 start, u64 size, u8
  * indirect memory key that provides access to the enitre address space given
  * by iotlb.
  */
-static int create_user_mr(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb 
*iotlb)
+static int create_user_mr(struct mlx5_vdpa_dev *mvdev,
+ struct mlx5_vdpa_mr *mr,
+ struct vhost_iotlb *iotlb)
 {
-   struct mlx5_vdpa_mr *mr = &mvdev->mr;
struct mlx5_vdpa_direct_mr *dmr;
struct mlx5_vdpa_direct_mr *n;
struct vhost_iotlb_map *map;
@@ -384,7 +388,7 @@ static int create_user_mr(struct mlx5_vdpa_dev *mvdev, 
struct vhost_iotlb *iotlb
   
LOG_MAX_KLM_SIZE);
mr->num_klms += nnuls;
}
-   err = add_direct_chain(mvdev, ps, pe - ps, 
pperm, iotlb);
+   err = add_direct_chain(mvdev, mr, ps, pe - ps, 
pperm, iotlb);
if (err)
goto err_chain;
}
@@ -393,7 +397,7 @@ static int create_user_mr(struct mlx5_vdpa_dev *mvdev, 
struct vhost_iotlb *iotlb
pperm = map->perm;
}
}
-   err = add_direct_chain(mvdev, ps, pe - ps, pperm, iotlb);
+   err = add_direct_chain(mvdev, mr, ps, pe - ps, pperm, iotlb);
if (err)
goto err_chain;
 
@@ -489,13 +493,8 @@ static void destroy_user_mr(struct mlx5_vdpa_dev *mvdev, 
struct mlx5_vdpa_mr *mr
}
 }
 
-static void _mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev, unsigned int 
asid)
+static void _mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev, struct 
mlx5_vdpa_mr *mr)
 {
-   struct mlx5_vdpa_mr *mr = &mvdev->mr;
-
-   if (mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP] != asid)
-   return;
-
if (!mr->initialized)
return;
 
@@ -507,38 +506,33 @@ static void _mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev 
*mvdev, unsigned int asid
mr->initialized = false;
 }
 
-void mlx5_vdpa_destroy_mr(struct mlx5_vdp

[PATCH vhost v3 13/16] vdpa/mlx5: Introduce mr for vq descriptor

2023-10-09 Thread Dragos Tatulea via Virtualization

Introduce the vq descriptor group and mr per ASID. Until now
.set_map on ASID 1 was only updating the cvq iotlb. From now on it also
creates a mkey for it. The current patch doesn't use it but follow-up
patches will add hardware support for mapping the vq descriptors.

Acked-by: Eugenio Pérez 
Signed-off-by: Dragos Tatulea 
---
 drivers/vdpa/mlx5/core/mlx5_vdpa.h |  5 +++--
 drivers/vdpa/mlx5/core/mr.c| 14 +-
 drivers/vdpa/mlx5/net/mlx5_vnet.c  | 20 +---
 3 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h 
b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
index bbe4335106bd..ae09296f4270 100644
--- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
@@ -70,11 +70,12 @@ struct mlx5_vdpa_wq_ent {
 enum {
MLX5_VDPA_DATAVQ_GROUP,
MLX5_VDPA_CVQ_GROUP,
+   MLX5_VDPA_DATAVQ_DESC_GROUP,
MLX5_VDPA_NUMVQ_GROUPS
 };
 
 enum {
-   MLX5_VDPA_NUM_AS = MLX5_VDPA_NUMVQ_GROUPS
+   MLX5_VDPA_NUM_AS = 2
 };
 
 struct mlx5_vdpa_dev {
@@ -89,7 +90,7 @@ struct mlx5_vdpa_dev {
u16 max_idx;
u32 generation;
 
-   struct mlx5_vdpa_mr *mr;
+   struct mlx5_vdpa_mr *mr[MLX5_VDPA_NUM_AS];
/* serialize mr access */
struct mutex mr_mtx;
struct mlx5_control_vq cvq;
diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
index 00eff5a07152..3dee6d9bed6b 100644
--- a/drivers/vdpa/mlx5/core/mr.c
+++ b/drivers/vdpa/mlx5/core/mr.c
@@ -511,8 +511,10 @@ void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev,
 
_mlx5_vdpa_destroy_mr(mvdev, mr);
 
-   if (mvdev->mr == mr)
-   mvdev->mr = NULL;
+   for (int i = 0; i < MLX5_VDPA_NUM_AS; i++) {
+   if (mvdev->mr[i] == mr)
+   mvdev->mr[i] = NULL;
+   }
 
mutex_unlock(&mvdev->mr_mtx);
 
@@ -523,11 +525,11 @@ void mlx5_vdpa_update_mr(struct mlx5_vdpa_dev *mvdev,
 struct mlx5_vdpa_mr *new_mr,
 unsigned int asid)
 {
-   struct mlx5_vdpa_mr *old_mr = mvdev->mr;
+   struct mlx5_vdpa_mr *old_mr = mvdev->mr[asid];
 
mutex_lock(&mvdev->mr_mtx);
 
-   mvdev->mr = new_mr;
+   mvdev->mr[asid] = new_mr;
if (old_mr) {
_mlx5_vdpa_destroy_mr(mvdev, old_mr);
kfree(old_mr);
@@ -539,7 +541,9 @@ void mlx5_vdpa_update_mr(struct mlx5_vdpa_dev *mvdev,
 
 void mlx5_vdpa_destroy_mr_resources(struct mlx5_vdpa_dev *mvdev)
 {
-   mlx5_vdpa_destroy_mr(mvdev, mvdev->mr);
+   for (int i = 0; i < MLX5_VDPA_NUM_AS; i++)
+   mlx5_vdpa_destroy_mr(mvdev, mvdev->mr[i]);
+
prune_iotlb(mvdev);
 }
 
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 4a87f9119fca..25bd2c324f5b 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -821,6 +821,8 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, 
struct mlx5_vdpa_virtque
 {
int inlen = MLX5_ST_SZ_BYTES(create_virtio_net_q_in);
u32 out[MLX5_ST_SZ_DW(create_virtio_net_q_out)] = {};
+   struct mlx5_vdpa_dev *mvdev = &ndev->mvdev;
+   struct mlx5_vdpa_mr *vq_mr;
void *obj_context;
u16 mlx_features;
void *cmd_hdr;
@@ -873,7 +875,9 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, 
struct mlx5_vdpa_virtque
MLX5_SET64(virtio_q, vq_ctx, desc_addr, mvq->desc_addr);
MLX5_SET64(virtio_q, vq_ctx, used_addr, mvq->device_addr);
MLX5_SET64(virtio_q, vq_ctx, available_addr, mvq->driver_addr);
-   MLX5_SET(virtio_q, vq_ctx, virtio_q_mkey, ndev->mvdev.mr->mkey);
+   vq_mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP]];
+   if (vq_mr)
+   MLX5_SET(virtio_q, vq_ctx, virtio_q_mkey, vq_mr->mkey);
MLX5_SET(virtio_q, vq_ctx, umem_1_id, mvq->umem1.id);
MLX5_SET(virtio_q, vq_ctx, umem_1_size, mvq->umem1.size);
MLX5_SET(virtio_q, vq_ctx, umem_2_id, mvq->umem2.id);
@@ -2633,7 +2637,8 @@ static void restore_channels_info(struct mlx5_vdpa_net 
*ndev)
 }
 
 static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev *mvdev,
-   struct mlx5_vdpa_mr *new_mr, unsigned int asid)
+   struct mlx5_vdpa_mr *new_mr,
+   unsigned int asid)
 {
struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
int err;
@@ -2652,8 +2657,10 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev 
*mvdev,
 
restore_channels_info(ndev);
err = setup_driver(mvdev);
+   if (err)
+   return err;
 
-   return err;
+   return 0;
 }
 
 /* reslock must be held for this function */
@@ -2869,8 +2876,8 @@ static int set_map_data(struct mlx5_vdpa_dev *mvdev, 
struct vhost_iotlb *iotlb,
struct mlx5_vdpa_mr *new_mr;
int err;
 
-   if (mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP] != asid)
-

[PATCH vhost v3 12/16] vdpa/mlx5: Improve mr update flow

2023-10-09 Thread Dragos Tatulea via Virtualization

The current flow for updating an mr works directly on mvdev->mr which
makes it cumbersome to handle multiple new mr structs.

This patch makes the flow more straightforward by having
mlx5_vdpa_create_mr return a new mr which will update the old mr (if
any). The old mr will be deleted and unlinked from mvdev.

This change paves the way for adding mrs for different ASIDs.

The initialized bool is no longer needed as mr is now a pointer in the
mlx5_vdpa_dev struct which will be NULL when not initialized.

Acked-by: Eugenio Pérez 
Signed-off-by: Dragos Tatulea 
---
 drivers/vdpa/mlx5/core/mlx5_vdpa.h | 14 +++--
 drivers/vdpa/mlx5/core/mr.c| 87 --
 drivers/vdpa/mlx5/net/mlx5_vnet.c  | 46 
 3 files changed, 76 insertions(+), 71 deletions(-)

diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h 
b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
index 9c6ac42c21e1..bbe4335106bd 100644
--- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
@@ -31,8 +31,6 @@ struct mlx5_vdpa_mr {
struct list_head head;
unsigned long num_directs;
unsigned long num_klms;
-   /* state of dvq mr */
-   bool initialized;
 
bool user_mr;
 };
@@ -91,7 +89,7 @@ struct mlx5_vdpa_dev {
u16 max_idx;
u32 generation;
 
-   struct mlx5_vdpa_mr mr;
+   struct mlx5_vdpa_mr *mr;
/* serialize mr access */
struct mutex mr_mtx;
struct mlx5_control_vq cvq;
@@ -114,14 +112,14 @@ void mlx5_vdpa_free_resources(struct mlx5_vdpa_dev 
*mvdev);
 int mlx5_vdpa_create_mkey(struct mlx5_vdpa_dev *mvdev, u32 *mkey, u32 *in,
  int inlen);
 int mlx5_vdpa_destroy_mkey(struct mlx5_vdpa_dev *mvdev, u32 mkey);
-int mlx5_vdpa_handle_set_map(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb 
*iotlb,
-bool *change_map, unsigned int asid);
-int mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev,
-   struct mlx5_vdpa_mr *mr,
-   struct vhost_iotlb *iotlb);
+struct mlx5_vdpa_mr *mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev,
+struct vhost_iotlb *iotlb);
 void mlx5_vdpa_destroy_mr_resources(struct mlx5_vdpa_dev *mvdev);
 void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev,
  struct mlx5_vdpa_mr *mr);
+void mlx5_vdpa_update_mr(struct mlx5_vdpa_dev *mvdev,
+struct mlx5_vdpa_mr *mr,
+unsigned int asid);
 int mlx5_vdpa_update_cvq_iotlb(struct mlx5_vdpa_dev *mvdev,
struct vhost_iotlb *iotlb,
unsigned int asid);
diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
index abd6a6fb122f..00eff5a07152 100644
--- a/drivers/vdpa/mlx5/core/mr.c
+++ b/drivers/vdpa/mlx5/core/mr.c
@@ -495,30 +495,51 @@ static void destroy_user_mr(struct mlx5_vdpa_dev *mvdev, 
struct mlx5_vdpa_mr *mr
 
 static void _mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev, struct 
mlx5_vdpa_mr *mr)
 {
-   if (!mr->initialized)
-   return;
-
if (mr->user_mr)
destroy_user_mr(mvdev, mr);
else
destroy_dma_mr(mvdev, mr);
-
-   mr->initialized = false;
 }
 
 void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev,
  struct mlx5_vdpa_mr *mr)
 {
+   if (!mr)
+   return;
+
mutex_lock(&mvdev->mr_mtx);
 
_mlx5_vdpa_destroy_mr(mvdev, mr);
 
+   if (mvdev->mr == mr)
+   mvdev->mr = NULL;
+
+   mutex_unlock(&mvdev->mr_mtx);
+
+   kfree(mr);
+}
+
+void mlx5_vdpa_update_mr(struct mlx5_vdpa_dev *mvdev,
+struct mlx5_vdpa_mr *new_mr,
+unsigned int asid)
+{
+   struct mlx5_vdpa_mr *old_mr = mvdev->mr;
+
+   mutex_lock(&mvdev->mr_mtx);
+
+   mvdev->mr = new_mr;
+   if (old_mr) {
+   _mlx5_vdpa_destroy_mr(mvdev, old_mr);
+   kfree(old_mr);
+   }
+
mutex_unlock(&mvdev->mr_mtx);
+
 }
 
 void mlx5_vdpa_destroy_mr_resources(struct mlx5_vdpa_dev *mvdev)
 {
-   mlx5_vdpa_destroy_mr(mvdev, &mvdev->mr);
+   mlx5_vdpa_destroy_mr(mvdev, mvdev->mr);
prune_iotlb(mvdev);
 }
 
@@ -528,52 +549,36 @@ static int _mlx5_vdpa_create_mr(struct mlx5_vdpa_dev 
*mvdev,
 {
int err;
 
-   if (mr->initialized)
-   return 0;
-
if (iotlb)
err = create_user_mr(mvdev, mr, iotlb);
else
err = create_dma_mr(mvdev, mr);
 
-   if (err)
-   return err;
-
-   mr->initialized = true;
-
-   return 0;
+   return err;
 }
 
-int mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev,
-   struct mlx5_vdpa_mr *mr,
-   struct vhost_iotlb *iotlb)
+struct mlx5_vdpa_mr *mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev,
+struct vhost_iotlb

[PATCH vhost v3 14/16] vdpa/mlx5: Enable hw support for vq descriptor mapping

2023-10-09 Thread Dragos Tatulea via Virtualization

Vq descriptor mappings are supported in hardware by filling in an
additional mkey which contains the descriptor mappings to the hw vq.

A previous patch in this series added support for hw mkey (mr) creation
for ASID 1.

This patch fills in both the vq data and vq descriptor mkeys based on
group ASID mapping.

The feature is signaled to the vdpa core through the presence of the
.get_vq_desc_group op.

Acked-by: Eugenio Pérez 
Signed-off-by: Dragos Tatulea 
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c  | 24 +++-
 include/linux/mlx5/mlx5_ifc_vdpa.h |  7 ++-
 2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 25bd2c324f5b..2e0a3ce1c0cf 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -823,6 +823,7 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, 
struct mlx5_vdpa_virtque
u32 out[MLX5_ST_SZ_DW(create_virtio_net_q_out)] = {};
struct mlx5_vdpa_dev *mvdev = &ndev->mvdev;
struct mlx5_vdpa_mr *vq_mr;
+   struct mlx5_vdpa_mr *vq_desc_mr;
void *obj_context;
u16 mlx_features;
void *cmd_hdr;
@@ -878,6 +879,11 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, 
struct mlx5_vdpa_virtque
vq_mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP]];
if (vq_mr)
MLX5_SET(virtio_q, vq_ctx, virtio_q_mkey, vq_mr->mkey);
+
+   vq_desc_mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_DESC_GROUP]];
+   if (vq_desc_mr)
+   MLX5_SET(virtio_q, vq_ctx, desc_group_mkey, vq_desc_mr->mkey);
+
MLX5_SET(virtio_q, vq_ctx, umem_1_id, mvq->umem1.id);
MLX5_SET(virtio_q, vq_ctx, umem_1_size, mvq->umem1.size);
MLX5_SET(virtio_q, vq_ctx, umem_2_id, mvq->umem2.id);
@@ -2265,6 +2271,16 @@ static u32 mlx5_vdpa_get_vq_group(struct vdpa_device 
*vdev, u16 idx)
return MLX5_VDPA_DATAVQ_GROUP;
 }
 
+static u32 mlx5_vdpa_get_vq_desc_group(struct vdpa_device *vdev, u16 idx)
+{
+   struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+
+   if (is_ctrl_vq_idx(mvdev, idx))
+   return MLX5_VDPA_CVQ_GROUP;
+
+   return MLX5_VDPA_DATAVQ_DESC_GROUP;
+}
+
 static u64 mlx_to_vritio_features(u16 dev_features)
 {
u64 result = 0;
@@ -3160,6 +3176,7 @@ static const struct vdpa_config_ops mlx5_vdpa_ops = {
.get_vq_irq = mlx5_get_vq_irq,
.get_vq_align = mlx5_vdpa_get_vq_align,
.get_vq_group = mlx5_vdpa_get_vq_group,
+   .get_vq_desc_group = mlx5_vdpa_get_vq_desc_group, /* Op disabled if not 
supported. */
.get_device_features = mlx5_vdpa_get_device_features,
.set_driver_features = mlx5_vdpa_set_driver_features,
.get_driver_features = mlx5_vdpa_get_driver_features,
@@ -3258,6 +3275,7 @@ struct mlx5_vdpa_mgmtdev {
struct vdpa_mgmt_dev mgtdev;
struct mlx5_adev *madev;
struct mlx5_vdpa_net *ndev;
+   struct vdpa_config_ops vdpa_ops;
 };
 
 static int config_func_mtu(struct mlx5_core_dev *mdev, u16 mtu)
@@ -3371,7 +3389,7 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev 
*v_mdev, const char *name,
max_vqs = 2;
}
 
-   ndev = vdpa_alloc_device(struct mlx5_vdpa_net, mvdev.vdev, 
mdev->device, &mlx5_vdpa_ops,
+   ndev = vdpa_alloc_device(struct mlx5_vdpa_net, mvdev.vdev, 
mdev->device, &mgtdev->vdpa_ops,
 MLX5_VDPA_NUMVQ_GROUPS, MLX5_VDPA_NUM_AS, 
name, false);
if (IS_ERR(ndev))
return PTR_ERR(ndev);
@@ -3546,6 +3564,10 @@ static int mlx5v_probe(struct auxiliary_device *adev,
MLX5_CAP_DEV_VDPA_EMULATION(mdev, max_num_virtio_queues) + 1;
mgtdev->mgtdev.supported_features = get_supported_features(mdev);
mgtdev->madev = madev;
+   mgtdev->vdpa_ops = mlx5_vdpa_ops;
+
+   if (!MLX5_CAP_DEV_VDPA_EMULATION(mdev, desc_group_mkey_supported))
+   mgtdev->vdpa_ops.get_vq_desc_group = NULL;
 
err = vdpa_mgmtdev_register(&mgtdev->mgtdev);
if (err)
diff --git a/include/linux/mlx5/mlx5_ifc_vdpa.h 
b/include/linux/mlx5/mlx5_ifc_vdpa.h
index 9becdc3fa503..b86d51a855f6 100644
--- a/include/linux/mlx5/mlx5_ifc_vdpa.h
+++ b/include/linux/mlx5/mlx5_ifc_vdpa.h
@@ -74,7 +74,11 @@ struct mlx5_ifc_virtio_q_bits {
u8reserved_at_320[0x8];
u8pd[0x18];
 
-   u8reserved_at_340[0xc0];
+   u8reserved_at_340[0x20];
+
+   u8desc_group_mkey[0x20];
+
+   u8reserved_at_380[0x80];
 };
 
 struct mlx5_ifc_virtio_net_q_object_bits {
@@ -141,6 +145,7 @@ enum {
MLX5_VIRTQ_MODIFY_MASK_STATE= (u64)1 << 0,
MLX5_VIRTQ_MODIFY_MASK_DIRTY_BITMAP_PARAMS  = (u64)1 << 3,
MLX5_VIRTQ_MODIFY_MASK_DIRTY_BITMAP_DUMP_ENABLE = (u64)1 << 4,
+   MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY  = (u64)1 << 14,
 };
 
 enum {
-- 
2.41.0

__

[PATCH vhost v3 15/16] vdpa/mlx5: Make iotlb helper functions more generic

2023-10-09 Thread Dragos Tatulea via Virtualization

They will be used in a follow-up patch.

For dup_iotlb, avoid the src == dst case. This is an error.

Acked-by: Eugenio Pérez 
Signed-off-by: Dragos Tatulea 
---
 drivers/vdpa/mlx5/core/mr.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
index 3dee6d9bed6b..4a3df865df40 100644
--- a/drivers/vdpa/mlx5/core/mr.c
+++ b/drivers/vdpa/mlx5/core/mr.c
@@ -454,20 +454,23 @@ static void destroy_dma_mr(struct mlx5_vdpa_dev *mvdev, 
struct mlx5_vdpa_mr *mr)
mlx5_vdpa_destroy_mkey(mvdev, mr->mkey);
 }
 
-static int dup_iotlb(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *src)
+static int dup_iotlb(struct vhost_iotlb *dst, struct vhost_iotlb *src)
 {
struct vhost_iotlb_map *map;
u64 start = 0, last = ULLONG_MAX;
int err;
 
+   if (dst == src)
+   return -EINVAL;
+
if (!src) {
-   err = vhost_iotlb_add_range(mvdev->cvq.iotlb, start, last, 
start, VHOST_ACCESS_RW);
+   err = vhost_iotlb_add_range(dst, start, last, start, 
VHOST_ACCESS_RW);
return err;
}
 
for (map = vhost_iotlb_itree_first(src, start, last); map;
map = vhost_iotlb_itree_next(map, start, last)) {
-   err = vhost_iotlb_add_range(mvdev->cvq.iotlb, map->start, 
map->last,
+   err = vhost_iotlb_add_range(dst, map->start, map->last,
map->addr, map->perm);
if (err)
return err;
@@ -475,9 +478,9 @@ static int dup_iotlb(struct mlx5_vdpa_dev *mvdev, struct 
vhost_iotlb *src)
return 0;
 }
 
-static void prune_iotlb(struct mlx5_vdpa_dev *mvdev)
+static void prune_iotlb(struct vhost_iotlb *iotlb)
 {
-   vhost_iotlb_del_range(mvdev->cvq.iotlb, 0, ULLONG_MAX);
+   vhost_iotlb_del_range(iotlb, 0, ULLONG_MAX);
 }
 
 static void destroy_user_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr 
*mr)
@@ -544,7 +547,7 @@ void mlx5_vdpa_destroy_mr_resources(struct mlx5_vdpa_dev 
*mvdev)
for (int i = 0; i < MLX5_VDPA_NUM_AS; i++)
mlx5_vdpa_destroy_mr(mvdev, mvdev->mr[i]);
 
-   prune_iotlb(mvdev);
+   prune_iotlb(mvdev->cvq.iotlb);
 }
 
 static int _mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev,
@@ -596,8 +599,8 @@ int mlx5_vdpa_update_cvq_iotlb(struct mlx5_vdpa_dev *mvdev,
 
spin_lock(&mvdev->cvq.iommu_lock);
 
-   prune_iotlb(mvdev);
-   err = dup_iotlb(mvdev, iotlb);
+   prune_iotlb(mvdev->cvq.iotlb);
+   err = dup_iotlb(mvdev->cvq.iotlb, iotlb);
 
spin_unlock(&mvdev->cvq.iommu_lock);
 
-- 
2.41.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH vhost v3 16/16] vdpa/mlx5: Update cvq iotlb mapping on ASID change

2023-10-09 Thread Dragos Tatulea via Virtualization

For the following sequence:
- cvq group is in ASID 0
- .set_map(1, cvq_iotlb)
- .set_group_asid(cvq_group, 1)

... the cvq mapping from ASID 0 will be used. This is not always correct
behaviour.

This patch adds support for the above mentioned flow by saving the iotlb
on each .set_map and updating the cvq iotlb with it on a cvq group change.

Acked-by: Eugenio Pérez 
Signed-off-by: Dragos Tatulea 
---
 drivers/vdpa/mlx5/core/mlx5_vdpa.h |  2 ++
 drivers/vdpa/mlx5/core/mr.c| 26 ++
 drivers/vdpa/mlx5/net/mlx5_vnet.c  |  9 -
 3 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h 
b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
index ae09296f4270..db988ced5a5d 100644
--- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
@@ -32,6 +32,8 @@ struct mlx5_vdpa_mr {
unsigned long num_directs;
unsigned long num_klms;
 
+   struct vhost_iotlb *iotlb;
+
bool user_mr;
 };
 
diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
index 4a3df865df40..66530e28f327 100644
--- a/drivers/vdpa/mlx5/core/mr.c
+++ b/drivers/vdpa/mlx5/core/mr.c
@@ -502,6 +502,8 @@ static void _mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev 
*mvdev, struct mlx5_vdpa_
destroy_user_mr(mvdev, mr);
else
destroy_dma_mr(mvdev, mr);
+
+   vhost_iotlb_free(mr->iotlb);
 }
 
 void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev,
@@ -561,6 +563,30 @@ static int _mlx5_vdpa_create_mr(struct mlx5_vdpa_dev 
*mvdev,
else
err = create_dma_mr(mvdev, mr);
 
+   if (err)
+   return err;
+
+   mr->iotlb = vhost_iotlb_alloc(0, 0);
+   if (!mr->iotlb) {
+   err = -ENOMEM;
+   goto err_mr;
+   }
+
+   err = dup_iotlb(mr->iotlb, iotlb);
+   if (err)
+   goto err_iotlb;
+
+   return 0;
+
+err_iotlb:
+   vhost_iotlb_free(mr->iotlb);
+
+err_mr:
+   if (iotlb)
+   destroy_user_mr(mvdev, mr);
+   else
+   destroy_dma_mr(mvdev, mr);
+
return err;
 }
 
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 2e0a3ce1c0cf..6abe02310f2b 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -3154,12 +3154,19 @@ static int mlx5_set_group_asid(struct vdpa_device 
*vdev, u32 group,
   unsigned int asid)
 {
struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+   int err = 0;
 
if (group >= MLX5_VDPA_NUMVQ_GROUPS)
return -EINVAL;
 
mvdev->group2asid[group] = asid;
-   return 0;
+
+   mutex_lock(&mvdev->mr_mtx);
+   if (group == MLX5_VDPA_CVQ_GROUP && mvdev->mr[asid])
+   err = mlx5_vdpa_update_cvq_iotlb(mvdev, mvdev->mr[asid]->iotlb, 
asid);
+   mutex_unlock(&mvdev->mr_mtx);
+
+   return err;
 }
 
 static const struct vdpa_config_ops mlx5_vdpa_ops = {
-- 
2.41.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature

Akihiko Odaki sorry about reposts.
Having an email with two "@" in the CC list:
 xuanzhuo@linux.alibaba.comsh...@kernel.org
tripped up mutt's reply-all for me and made it send to you only.

I am guessing you meant two addresses: xuanz...@linux.alibaba.com
and sh...@kernel.org.


On Sun, Oct 08, 2023 at 02:20:49PM +0900, Akihiko Odaki wrote:
> virtio-net have two usage of hashes: one is RSS and another is hash
> reporting. Conventionally the hash calculation was done by the VMM.
> However, computing the hash after the queue was chosen defeats the
> purpose of RSS.
> 
> Another approach is to use eBPF steering program. This approach has
> another downside: it cannot report the calculated hash due to the
> restrictive nature of eBPF.
> 
> Introduce the code to compute hashes to the kernel in order to overcome
> thse challenges. An alternative solution is to extend the eBPF steering
> program so that it will be able to report to the userspace, but it makes
> little sense to allow to implement different hashing algorithms with
> eBPF since the hash value reported by virtio-net is strictly defined by
> the specification.
> 
> The hash value already stored in sk_buff is not used and computed
> independently since it may have been computed in a way not conformant
> with the specification.
> 
> Signed-off-by: Akihiko Odaki 
> ---
>  drivers/net/tun.c   | 187 
>  include/uapi/linux/if_tun.h |  16 +++
>  2 files changed, 182 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 89ab9efe522c..561a573cd008 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -171,6 +171,9 @@ struct tun_prog {
>   struct bpf_prog *prog;
>  };
>  
> +#define TUN_VNET_HASH_MAX_KEY_SIZE 40
> +#define TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH 128
> +

where do these come from?

>  /* Since the socket were moved to tun_file, to preserve the behavior of 
> persist
>   * device, socket filter, sndbuf and vnet header size were restore when the
>   * file were attached to a persist device.
> @@ -209,6 +212,9 @@ struct tun_struct {
>   struct tun_prog __rcu *steering_prog;
>   struct tun_prog __rcu *filter_prog;
>   struct ethtool_link_ksettings link_ksettings;
> + struct tun_vnet_hash vnet_hash;
> + u16 
> vnet_hash_indirection_table[TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH];
> + u32 vnet_hash_key[TUN_VNET_HASH_MAX_KEY_SIZE / 4];

That's quite a lot of data to add in this struct, and will be used
by a small minority of users. Are you pushing any hot data out of cache
with this? Why not allocate these as needed?

>   /* init args */
>   struct file *file;
>   struct ifreq *ifr;
> @@ -219,6 +225,13 @@ struct veth {
>   __be16 h_vlan_TCI;
>  };
>  
> +static const struct tun_vnet_hash_cap tun_vnet_hash_cap = {
> + .max_indirection_table_length =
> + TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH,
> +
> + .types = VIRTIO_NET_SUPPORTED_HASH_TYPES
> +};
> +
>  static void tun_flow_init(struct tun_struct *tun);
>  static void tun_flow_uninit(struct tun_struct *tun);
>  
> @@ -320,10 +333,16 @@ static long tun_set_vnet_be(struct tun_struct *tun, int 
> __user *argp)
>   if (get_user(be, argp))
>   return -EFAULT;
>  
> - if (be)
> + if (be) {
> + if (!(tun->flags & TUN_VNET_LE) &&
> + (tun->vnet_hash.flags & TUN_VNET_HASH_REPORT)) {
> + return -EINVAL;
> + }
> +
>   tun->flags |= TUN_VNET_BE;
> - else
> + } else {
>   tun->flags &= ~TUN_VNET_BE;
> + }
>  
>   return 0;
>  }
> @@ -558,15 +577,47 @@ static u16 tun_ebpf_select_queue(struct tun_struct 
> *tun, struct sk_buff *skb)
>   return ret % numqueues;
>  }
>  
> +static u16 tun_vnet_select_queue(struct tun_struct *tun, struct sk_buff *skb)
> +{
> + u32 value = qdisc_skb_cb(skb)->tun_vnet_hash_value;
> + u32 numqueues;
> + u32 index;
> + u16 queue;
> +
> + numqueues = READ_ONCE(tun->numqueues);
> + if (!numqueues)
> + return 0;
> +
> + index = value & READ_ONCE(tun->vnet_hash.indirection_table_mask);
> + queue = READ_ONCE(tun->vnet_hash_indirection_table[index]);
> + if (!queue)
> + queue = READ_ONCE(tun->vnet_hash.unclassified_queue);

Apparently 0 is an illegal queue value? You are making this part
of UAPI better document things like this.

> +
> + return queue % numqueues;
> +}
> +
>  static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb,
>   struct net_device *sb_dev)
>  {
>   struct tun_struct *tun = netdev_priv(dev);
> + u8 vnet_hash_flags = READ_ONCE(tun->vnet_hash.flags);
> + struct virtio_net_hash hash;
>   u16 ret;
>  
> + if (vnet_hash_flags & (TUN_VNET_HASH_RSS | TUN_VNET_HASH_REPORT)) {
> + virtio_net_hash(skb, READ_ONCE(tun->vnet_hash.types),
> +

Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature

On Mon, Oct 09, 2023 at 05:44:20PM +0900, Akihiko Odaki wrote:
> On 2023/10/09 17:13, Willem de Bruijn wrote:
> > On Sun, Oct 8, 2023 at 12:22 AM Akihiko Odaki  
> > wrote:
> > > 
> > > virtio-net have two usage of hashes: one is RSS and another is hash
> > > reporting. Conventionally the hash calculation was done by the VMM.
> > > However, computing the hash after the queue was chosen defeats the
> > > purpose of RSS.
> > > 
> > > Another approach is to use eBPF steering program. This approach has
> > > another downside: it cannot report the calculated hash due to the
> > > restrictive nature of eBPF.
> > > 
> > > Introduce the code to compute hashes to the kernel in order to overcome
> > > thse challenges. An alternative solution is to extend the eBPF steering
> > > program so that it will be able to report to the userspace, but it makes
> > > little sense to allow to implement different hashing algorithms with
> > > eBPF since the hash value reported by virtio-net is strictly defined by
> > > the specification.
> > > 
> > > The hash value already stored in sk_buff is not used and computed
> > > independently since it may have been computed in a way not conformant
> > > with the specification.
> > > 
> > > Signed-off-by: Akihiko Odaki 
> > 
> > > @@ -2116,31 +2172,49 @@ static ssize_t tun_put_user(struct tun_struct 
> > > *tun,
> > >  }
> > > 
> > >  if (vnet_hdr_sz) {
> > > -   struct virtio_net_hdr gso;
> > > +   union {
> > > +   struct virtio_net_hdr hdr;
> > > +   struct virtio_net_hdr_v1_hash v1_hash_hdr;
> > > +   } hdr;
> > > +   int ret;
> > > 
> > >  if (iov_iter_count(iter) < vnet_hdr_sz)
> > >  return -EINVAL;
> > > 
> > > -   if (virtio_net_hdr_from_skb(skb, &gso,
> > > -   tun_is_little_endian(tun), 
> > > true,
> > > -   vlan_hlen)) {
> > > +   if ((READ_ONCE(tun->vnet_hash.flags) & 
> > > TUN_VNET_HASH_REPORT) &&
> > > +   vnet_hdr_sz >= sizeof(hdr.v1_hash_hdr) &&
> > > +   skb->tun_vnet_hash) {
> > 
> > Isn't vnet_hdr_sz guaranteed to be >= hdr.v1_hash_hdr, by virtue of
> > the set hash ioctl failing otherwise?
> > 
> > Such checks should be limited to control path where possible
> 
> There is a potential race since tun->vnet_hash.flags and vnet_hdr_sz are not
> read at once.

And then it's a complete mess and you get inconsistent
behaviour with packets getting sent all over the place, right?
So maybe keep a pointer to this struct so it can be
changed atomically then. Maybe even something with rcu I donnu.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next v3 12/12] test/vsock: io_uring rx/tx tests


On Sat, Oct 07, 2023 at 08:21:39PM +0300, Arseniy Krasnov wrote:

This adds set of tests which use io_uring for rx/tx. This test suite is
implemented as separated util like 'vsock_test' and has the same set of
input arguments as 'vsock_test'. These tests only cover cases of data
transmission (no connect/bind/accept etc).

Signed-off-by: Arseniy Krasnov 
---
Changelog:
v1 -> v2:
 * Add 'LDLIBS = -luring' to the target 'vsock_uring_test'.
 * Add 'vsock_uring_test' to the target 'test'.
v2 -> v3:
 * Make 'struct vsock_test_data' private by placing it to the .c file.
   Rename it and add comments to this struct to clarify sense of its
   fields.
 * Add 'vsock_uring_test' to the '.gitignore'.
 * Add receive loop to the server side - this is needed to read entire
   data sent by client.

tools/testing/vsock/.gitignore |   1 +
tools/testing/vsock/Makefile   |   7 +-
tools/testing/vsock/vsock_uring_test.c | 350 +
3 files changed, 356 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/vsock/vsock_uring_test.c

diff --git a/tools/testing/vsock/.gitignore b/tools/testing/vsock/.gitignore
index a8adcfdc292b..d9f798713cd7 100644
--- a/tools/testing/vsock/.gitignore
+++ b/tools/testing/vsock/.gitignore
@@ -3,3 +3,4 @@
vsock_test
vsock_diag_test
vsock_perf
+vsock_uring_test
diff --git a/tools/testing/vsock/Makefile b/tools/testing/vsock/Makefile
index 1a26f60a596c..b80e7c7def1e 100644
--- a/tools/testing/vsock/Makefile
+++ b/tools/testing/vsock/Makefile
@@ -1,12 +1,15 @@
# SPDX-License-Identifier: GPL-2.0-only
all: test vsock_perf
-test: vsock_test vsock_diag_test
+test: vsock_test vsock_diag_test vsock_uring_test
vsock_test: vsock_test.o vsock_test_zerocopy.o timeout.o control.o util.o
vsock_diag_test: vsock_diag_test.o timeout.o control.o util.o
vsock_perf: vsock_perf.o

+vsock_uring_test: LDLIBS = -luring
+vsock_uring_test: control.o util.o vsock_uring_test.o timeout.o
+
CFLAGS += -g -O2 -Werror -Wall -I. -I../../include -I../../../usr/include 
-Wno-pointer-sign -fno-strict-overflow -fno-strict-aliasing -fno-common -MMD 
-U_FORTIFY_SOURCE -D_GNU_SOURCE
.PHONY: all test clean
clean:
-   ${RM} *.o *.d vsock_test vsock_diag_test vsock_perf
+   ${RM} *.o *.d vsock_test vsock_diag_test vsock_perf vsock_uring_test
-include *.d
diff --git a/tools/testing/vsock/vsock_uring_test.c 
b/tools/testing/vsock/vsock_uring_test.c
new file mode 100644
index ..889887cf3989
--- /dev/null
+++ b/tools/testing/vsock/vsock_uring_test.c
@@ -0,0 +1,350 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* io_uring tests for vsock
+ *
+ * Copyright (C) 2023 SberDevices.
+ *
+ * Author: Arseniy Krasnov 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "util.h"
+#include "control.h"
+#include "msg_zerocopy_common.h"
+
+#define PAGE_SIZE  4096


Ditto.


+#define RING_ENTRIES_NUM   4
+
+#define VSOCK_TEST_DATA_MAX_IOV 3
+
+struct vsock_io_uring_test {
+   /* Number of valid elements in 'vecs'. */
+   int vecs_cnt;
+   /* Array how to allocate buffers for test.
+* 'iov_base' == NULL -> valid buf: mmap('iov_len').
+*
+* 'iov_base' == MAP_FAILED -> invalid buf:
+*   mmap('iov_len'), then munmap('iov_len').
+*   'iov_base' still contains result of
+*   mmap().
+*
+* 'iov_base' == number -> unaligned valid buf:
+*   mmap('iov_len') + number.
+*/
+   struct iovec vecs[VSOCK_TEST_DATA_MAX_IOV];
+};
+
+static struct vsock_io_uring_test test_data_array[] = {
+   /* All elements have page aligned base and size. */
+   {
+   .vecs_cnt = 3,
+   {
+   { NULL, PAGE_SIZE },
+   { NULL, 2 * PAGE_SIZE },
+   { NULL, 3 * PAGE_SIZE },
+   }
+   },
+   /* Middle element has both non-page aligned base and size. */
+   {
+   .vecs_cnt = 3,
+   {
+   { NULL, PAGE_SIZE },
+   { (void *)1, 200  },
+   { NULL, 3 * PAGE_SIZE },
+   }
+   }
+};
+
+static void vsock_io_uring_client(const struct test_opts *opts,
+ const struct vsock_io_uring_test *test_data,
+ bool msg_zerocopy)
+{
+   struct io_uring_sqe *sqe;
+   struct io_uring_cqe *cqe;
+   struct io_uring ring;
+   struct iovec *iovec;
+   struct msghdr msg;
+   int fd;
+
+   fd = vsock_stream_connect(opts->peer_cid, 1234);
+   if (fd < 0) {
+   perror("connect");
+   exit(EXIT_FAILURE);
+   }
+
+   if (msg_zerocopy)
+   enable_so_zerocopy(fd);
+
+   iovec = iovec_from_test_data(test_data->vecs, test_data->vecs_cnt);


Ah, I see this is used also here, so now I get why in util.

Re: [PATCH net-next v3 02/12] vsock: read from socket's error queue


On Sat, Oct 07, 2023 at 08:21:29PM +0300, Arseniy Krasnov wrote:

This adds handling of MSG_ERRQUEUE input flag in receive call. This flag
is used to read socket's error queue instead of data queue. Possible
scenario of error queue usage is receiving completions for transmission
with MSG_ZEROCOPY flag. This patch also adds new defines: 'SOL_VSOCK'
and 'VSOCK_RECVERR'.

Signed-off-by: Arseniy Krasnov 
---
Changelog:
v1 -> v2:
 * Place new defines for userspace to the existing file 'vm_sockets.h'
   instead of creating new one.
v2 -> v3:
 * Add comments to describe 'SOL_VSOCK' and 'VSOCK_RECVERR' in the file
   'vm_sockets.h'.
 * Reorder includes in 'af_vsock.c' in alphabetical order.

include/linux/socket.h  |  1 +
include/uapi/linux/vm_sockets.h | 12 
net/vmw_vsock/af_vsock.c|  6 ++
3 files changed, 19 insertions(+)

diff --git a/include/linux/socket.h b/include/linux/socket.h
index 39b74d83c7c4..cfcb7e2c3813 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -383,6 +383,7 @@ struct ucred {
#define SOL_MPTCP   284
#define SOL_MCTP285
#define SOL_SMC 286
+#define SOL_VSOCK  287

/* IPX options */
#define IPX_TYPE1
diff --git a/include/uapi/linux/vm_sockets.h b/include/uapi/linux/vm_sockets.h
index c60ca33eac59..d9d703b2d45a 100644
--- a/include/uapi/linux/vm_sockets.h
+++ b/include/uapi/linux/vm_sockets.h
@@ -191,4 +191,16 @@ struct sockaddr_vm {

#define IOCTL_VM_SOCKETS_GET_LOCAL_CID  _IO(7, 0xb9)

+/* For reading completion in case of MSG_ZEROCOPY flag transmission.
+ * This is value of 'cmsg_level' field of the 'struct cmsghdr'.
+ */
+
+#define SOL_VSOCK  287
+
+/* For reading completion in case of MSG_ZEROCOPY flag transmission.
+ * This is value of 'cmsg_type' field of the 'struct cmsghdr'.
+ */
+
+#define VSOCK_RECVERR  1


I would suggest a bit more context here, something like this:

diff --git a/include/uapi/linux/vm_sockets.h b/include/uapi/linux/vm_sockets.h
index d9d703b2d45a..ed07181d4eff 100644
--- a/include/uapi/linux/vm_sockets.h
+++ b/include/uapi/linux/vm_sockets.h
@@ -191,14 +191,19 @@ struct sockaddr_vm {

 #define IOCTL_VM_SOCKETS_GET_LOCAL_CID _IO(7, 0xb9)

-/* For reading completion in case of MSG_ZEROCOPY flag transmission.
- * This is value of 'cmsg_level' field of the 'struct cmsghdr'.
+/* MSG_ZEROCOPY notifications are encoded in the standard error format,
+ * sock_extended_err. See Documentation/networking/msg_zerocopy.rst in
+ * kernel source tree for more details.
+ */
+
+/* 'cmsg_level' field value of 'struct cmsghdr' for notification parsing
+ * when MSG_ZEROCOPY flag is used on transmissions.
  */

 #define SOL_VSOCK  287

-/* For reading completion in case of MSG_ZEROCOPY flag transmission.
- * This is value of 'cmsg_type' field of the 'struct cmsghdr'.
+/* 'cmsg_type' field value of 'struct cmsghdr' for notification parsing
+ * when MSG_ZEROCOPY flag is used on transmissions.
  */

 #define VSOCK_RECVERR  1

The rest LGTM.

Stefano


+
#endif /* _UAPI_VM_SOCKETS_H */
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index d841f4de33b0..38486efd3d05 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -89,6 +89,7 @@
#include 
#include 
#include 
+#include 
#include 
#include 
#include 
@@ -110,6 +111,7 @@
#include 
#include 
#include 
+#include 

static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr);
static void vsock_sk_destruct(struct sock *sk);
@@ -2137,6 +2139,10 @@ vsock_connectible_recvmsg(struct socket *sock, struct 
msghdr *msg, size_t len,
int err;

sk = sock->sk;
+
+   if (unlikely(flags & MSG_ERRQUEUE))
+   return sock_recv_errqueue(sk, msg, len, SOL_VSOCK, 
VSOCK_RECVERR);
+
vsk = vsock_sk(sk);
err = 0;

--
2.25.1



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next v3 10/12] test/vsock: MSG_ZEROCOPY flag tests


On Sat, Oct 07, 2023 at 08:21:37PM +0300, Arseniy Krasnov wrote:

This adds three tests for MSG_ZEROCOPY feature:
1) SOCK_STREAM tx with different buffers.
2) SOCK_SEQPACKET tx with different buffers.
3) SOCK_STREAM test to read empty error queue of the socket.

Patch also works as preparation for the next patches for tools in this
patchset: vsock_perf and vsock_uring_test:
1) Adds several new functions to util.c - they will be also used by
  vsock_uring_test.
2) Adds two new functions for MSG_ZEROCOPY handling to a new header
  file - such header will be shared between vsock_test, vsock_perf and
  vsock_uring_test, thus avoiding code copy-pasting.

Signed-off-by: Arseniy Krasnov 
---
Changelog:
v1 -> v2:
 * Move 'SOL_VSOCK' and 'VSOCK_RECVERR' from 'util.c' to 'util.h'.
v2 -> v3:
 * Patch was reworked. Now it is also preparation patch (see commit
   message). Shared stuff for 'vsock_perf' and tests is placed to a
   new header file, while shared code between current test tool and
   future uring test is placed to the 'util.c'. I think, that making
   this patch as preparation allows to reduce number of changes in the
   next patches in this patchset.
 * Make 'struct vsock_test_data' private by placing it to the .c file.
   Also add comments to this struct to clarify sense of its fields.

tools/testing/vsock/Makefile  |   2 +-
tools/testing/vsock/msg_zerocopy_common.h |  92 ++
tools/testing/vsock/util.c| 110 +++
tools/testing/vsock/util.h|   5 +
tools/testing/vsock/vsock_test.c  |  16 +
tools/testing/vsock/vsock_test_zerocopy.c | 367 ++
tools/testing/vsock/vsock_test_zerocopy.h |  15 +
7 files changed, 606 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/vsock/msg_zerocopy_common.h
create mode 100644 tools/testing/vsock/vsock_test_zerocopy.c
create mode 100644 tools/testing/vsock/vsock_test_zerocopy.h

diff --git a/tools/testing/vsock/Makefile b/tools/testing/vsock/Makefile
index 21a98ba565ab..1a26f60a596c 100644
--- a/tools/testing/vsock/Makefile
+++ b/tools/testing/vsock/Makefile
@@ -1,7 +1,7 @@
# SPDX-License-Identifier: GPL-2.0-only
all: test vsock_perf
test: vsock_test vsock_diag_test
-vsock_test: vsock_test.o timeout.o control.o util.o
+vsock_test: vsock_test.o vsock_test_zerocopy.o timeout.o control.o util.o
vsock_diag_test: vsock_diag_test.o timeout.o control.o util.o
vsock_perf: vsock_perf.o

diff --git a/tools/testing/vsock/msg_zerocopy_common.h 
b/tools/testing/vsock/msg_zerocopy_common.h
new file mode 100644
index ..ce89f1281584
--- /dev/null
+++ b/tools/testing/vsock/msg_zerocopy_common.h
@@ -0,0 +1,92 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef MSG_ZEROCOPY_COMMON_H
+#define MSG_ZEROCOPY_COMMON_H
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#ifndef SOL_VSOCK
+#define SOL_VSOCK  287
+#endif
+
+#ifndef VSOCK_RECVERR
+#define VSOCK_RECVERR  1
+#endif
+
+static void enable_so_zerocopy(int fd)
+{
+   int val = 1;
+
+   if (setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &val, sizeof(val))) {
+   perror("setsockopt");
+   exit(EXIT_FAILURE);
+   }
+}
+
+static void vsock_recv_completion(int fd, const bool *zerocopied) 
__maybe_unused;


To avoid this, maybe we can implement those functions in .c file and
link the object.

WDYT?

Ah, here (cc (GCC) 13.2.1 20230728 (Red Hat 13.2.1-1)) the build is
failing:

In file included from vsock_perf.c:23:
msg_zerocopy_common.h: In function ‘vsock_recv_completion’:
msg_zerocopy_common.h:29:67: error: expected declaration specifiers before 
‘__maybe_unused’
   29 | static void vsock_recv_completion(int fd, const bool *zerocopied) 
__maybe_unused;
  |   
^~
msg_zerocopy_common.h:31:1: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or 
‘__attribute__’ before ‘{’ token
   31 | {
  | ^


+static void vsock_recv_completion(int fd, const bool *zerocopied)
+{
+   struct sock_extended_err *serr;
+   struct msghdr msg = { 0 };
+   char cmsg_data[128];
+   struct cmsghdr *cm;
+   ssize_t res;
+
+   msg.msg_control = cmsg_data;
+   msg.msg_controllen = sizeof(cmsg_data);
+
+   res = recvmsg(fd, &msg, MSG_ERRQUEUE);
+   if (res) {
+   fprintf(stderr, "failed to read error queue: %zi\n", res);
+   exit(EXIT_FAILURE);
+   }
+
+   cm = CMSG_FIRSTHDR(&msg);
+   if (!cm) {
+   fprintf(stderr, "cmsg: no cmsg\n");
+   exit(EXIT_FAILURE);
+   }
+
+   if (cm->cmsg_level != SOL_VSOCK) {
+   fprintf(stderr, "cmsg: unexpected 'cmsg_level'\n");
+   exit(EXIT_FAILURE);
+   }
+
+   if (cm->cmsg_type != VSOCK_RECVERR) {
+   fprintf(stderr, "cmsg: unexpected 'cmsg_type'\n");
+   exit(EXIT_FAILURE);
+   }
+
+   serr = (void *)CMSG_DATA(cm);
+   if (serr->ee_origin != SO_EE_ORIGIN_ZEROCOP

Re: [PATCH net-next v3 11/12] test/vsock: MSG_ZEROCOPY support for vsock_perf


On Sat, Oct 07, 2023 at 08:21:38PM +0300, Arseniy Krasnov wrote:

To use this option pass '--zerocopy' parameter:

./vsock_perf --zerocopy --sender  ...

With this option MSG_ZEROCOPY flag will be passed to the 'send()' call.

Signed-off-by: Arseniy Krasnov 
---
Changelog:
v1 -> v2:
 * Move 'SOL_VSOCK' and 'VSOCK_RECVERR' from 'util.c' to 'util.h'.
v2 -> v3:
 * Use 'msg_zerocopy_common.h' for MSG_ZEROCOPY related things.
 * Rename '--zc' option to '--zerocopy'.
 * Add detail in help that zerocopy mode is for sender mode only.

tools/testing/vsock/vsock_perf.c | 80 
1 file changed, 71 insertions(+), 9 deletions(-)


Reviewed-by: Stefano Garzarella 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH vhost v3 2/4] virtio_pci: fix the common cfg map size

The function vp_modern_map_capability() takes the size parameter,
which corresponds to the size of virtio_pci_common_cfg. As a result,
this indicates the size of memory area to map.

Now the size is the size of virtio_pci_common_cfg, but some feature(such
as the _F_RING_RESET) needs the virtio_pci_modern_common_cfg, so this
commit changes the size to the size of virtio_pci_modern_common_cfg.

Fixes: 0b50cece0b78 ("virtio_pci: introduce helper to get/set queue reset")
Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_pci_modern_dev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_pci_modern_dev.c 
b/drivers/virtio/virtio_pci_modern_dev.c
index aad7d9296e77..9cb601e16688 100644
--- a/drivers/virtio/virtio_pci_modern_dev.c
+++ b/drivers/virtio/virtio_pci_modern_dev.c
@@ -291,7 +291,7 @@ int vp_modern_probe(struct virtio_pci_modern_device *mdev)
err = -EINVAL;
mdev->common = vp_modern_map_capability(mdev, common,
  sizeof(struct virtio_pci_common_cfg), 4,
- 0, sizeof(struct virtio_pci_common_cfg),
+ 0, sizeof(struct 
virtio_pci_modern_common_cfg),
  NULL, NULL);
if (!mdev->common)
goto err_map_common;
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH vhost v3 1/4] virtio: add definition of VIRTIO_F_NOTIF_CONFIG_DATA feature bit

This patch adds the definition of VIRTIO_F_NOTIF_CONFIG_DATA feature bit
in the relevant header file.

This feature indicates that the driver uses the data provided by the
device as a virtqueue identifier in available buffer notifications.

It comes from here:
https://github.com/oasis-tcs/virtio-spec/issues/89

Signed-off-by: Xuan Zhuo 
---
 include/uapi/linux/virtio_config.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/include/uapi/linux/virtio_config.h 
b/include/uapi/linux/virtio_config.h
index 2c712c654165..8881aea60f6f 100644
--- a/include/uapi/linux/virtio_config.h
+++ b/include/uapi/linux/virtio_config.h
@@ -105,6 +105,11 @@
  */
 #define VIRTIO_F_NOTIFICATION_DATA 38
 
+/* This feature indicates that the driver uses the data provided by the device
+ * as a virtqueue identifier in available buffer notifications.
+ */
+#define VIRTIO_F_NOTIF_CONFIG_DATA 39
+
 /*
  * This feature indicates that the driver can reset a queue individually.
  */
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH vhost v3 4/4] virtio_pci: add build offset check for the new common cfg items

Add checks to the check_offsets(void) for queue_notify_data and
queue_reset.

Signed-off-by: Xuan Zhuo 
Acked-by: Jason Wang 
---
 drivers/virtio/virtio_pci_modern_dev.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/virtio/virtio_pci_modern_dev.c 
b/drivers/virtio/virtio_pci_modern_dev.c
index 33f319da1558..e2a1fe7bb66c 100644
--- a/drivers/virtio/virtio_pci_modern_dev.c
+++ b/drivers/virtio/virtio_pci_modern_dev.c
@@ -203,6 +203,10 @@ static inline void check_offsets(void)
 offsetof(struct virtio_pci_common_cfg, queue_used_lo));
BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_USEDHI !=
 offsetof(struct virtio_pci_common_cfg, queue_used_hi));
+   BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_NDATA !=
+offsetof(struct virtio_pci_modern_common_cfg, 
queue_notify_data));
+   BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_RESET !=
+offsetof(struct virtio_pci_modern_common_cfg, 
queue_reset));
 }
 
 /*
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH vhost v3 0/4] strictly check the acccess to the common cfg

1. fix the length of the pci_iomap_range() to the modern common cfg
2. add common cfg size check for the VIRTIO_F_NOTIF_CONFIG_DATA, 
VIRTIO_F_RING_RESET
3. add build size check to the new common cfg items

Xuan Zhuo (4):
  virtio: add definition of VIRTIO_F_NOTIF_CONFIG_DATA feature bit
  virtio_pci: fix the common cfg map size
  virtio_pci: add check for common cfg size
  virtio_pci: add build offset check for the new common cfg items

 drivers/virtio/virtio_pci_modern.c | 36 ++
 drivers/virtio/virtio_pci_modern_dev.c |  8 --
 include/linux/virtio_pci_modern.h  |  1 +
 include/uapi/linux/virtio_config.h |  5 
 4 files changed, 48 insertions(+), 2 deletions(-)

--
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH vhost v3 3/4] virtio_pci: add check for common cfg size

Some buggy devices, the common cfg size may not match the features.

This patch checks the common cfg size for the
features(VIRTIO_F_NOTIF_CONFIG_DATA, VIRTIO_F_RING_RESET). When the
common cfg size does not match the corresponding feature, we fail the
probe and print error message.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_pci_modern.c | 36 ++
 drivers/virtio/virtio_pci_modern_dev.c |  2 +-
 include/linux/virtio_pci_modern.h  |  1 +
 3 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_pci_modern.c 
b/drivers/virtio/virtio_pci_modern.c
index d6bb68ba84e5..6a8f5ff05636 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -39,6 +39,39 @@ static void vp_transport_features(struct virtio_device 
*vdev, u64 features)
__virtio_set_bit(vdev, VIRTIO_F_RING_RESET);
 }
 
+static int __vp_check_common_size_one_feature(struct virtio_device *vdev, u32 
fbit,
+   u32 offset, const char *fname)
+{
+   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+
+   if (!__virtio_test_bit(vdev, fbit))
+   return 0;
+
+   if (likely(vp_dev->mdev.common_len >= offset))
+   return 0;
+
+   dev_err(&vdev->dev,
+   "virtio: common cfg size(%ld) does not match the feature %s\n",
+   vp_dev->mdev.common_len, fname);
+
+   return -EINVAL;
+}
+
+#define vp_check_common_size_one_feature(vdev, fbit, field) \
+   __vp_check_common_size_one_feature(vdev, fbit, \
+   offsetofend(struct virtio_pci_modern_common_cfg, field), #fbit)
+
+static int vp_check_common_size(struct virtio_device *vdev)
+{
+   if (vp_check_common_size_one_feature(vdev, VIRTIO_F_NOTIF_CONFIG_DATA, 
queue_notify_data))
+   return -EINVAL;
+
+   if (vp_check_common_size_one_feature(vdev, VIRTIO_F_RING_RESET, 
queue_reset))
+   return -EINVAL;
+
+   return 0;
+}
+
 /* virtio config->finalize_features() implementation */
 static int vp_finalize_features(struct virtio_device *vdev)
 {
@@ -57,6 +90,9 @@ static int vp_finalize_features(struct virtio_device *vdev)
return -EINVAL;
}
 
+   if (vp_check_common_size(vdev))
+   return -EINVAL;
+
vp_modern_set_features(&vp_dev->mdev, vdev->features);
 
return 0;
diff --git a/drivers/virtio/virtio_pci_modern_dev.c 
b/drivers/virtio/virtio_pci_modern_dev.c
index 9cb601e16688..33f319da1558 100644
--- a/drivers/virtio/virtio_pci_modern_dev.c
+++ b/drivers/virtio/virtio_pci_modern_dev.c
@@ -292,7 +292,7 @@ int vp_modern_probe(struct virtio_pci_modern_device *mdev)
mdev->common = vp_modern_map_capability(mdev, common,
  sizeof(struct virtio_pci_common_cfg), 4,
  0, sizeof(struct 
virtio_pci_modern_common_cfg),
- NULL, NULL);
+ &mdev->common_len, NULL);
if (!mdev->common)
goto err_map_common;
mdev->isr = vp_modern_map_capability(mdev, isr, sizeof(u8), 1,
diff --git a/include/linux/virtio_pci_modern.h 
b/include/linux/virtio_pci_modern.h
index 067ac1d789bc..edf62bae0474 100644
--- a/include/linux/virtio_pci_modern.h
+++ b/include/linux/virtio_pci_modern.h
@@ -28,6 +28,7 @@ struct virtio_pci_modern_device {
/* So we can sanity-check accesses. */
size_t notify_len;
size_t device_len;
+   size_t common_len;
 
/* Capability for when we need to map notifications per-vq. */
int notify_map_cap;
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature

On Tue, Oct 10, 2023 at 9:52 AM Akihiko Odaki  wrote:
>
> On 2023/10/09 19:44, Willem de Bruijn wrote:
> > On Mon, Oct 9, 2023 at 3:12 AM Akihiko Odaki  
> > wrote:
> >>
> >> On 2023/10/09 19:06, Willem de Bruijn wrote:
> >>> On Mon, Oct 9, 2023 at 3:02 AM Akihiko Odaki  
> >>> wrote:
> 
>  On 2023/10/09 18:57, Willem de Bruijn wrote:
> > On Mon, Oct 9, 2023 at 3:57 AM Akihiko Odaki  
> > wrote:
> >>
> >> On 2023/10/09 17:04, Willem de Bruijn wrote:
> >>> On Sun, Oct 8, 2023 at 3:46 PM Akihiko Odaki 
> >>>  wrote:
> 
>  On 2023/10/09 5:08, Willem de Bruijn wrote:
> > On Sun, Oct 8, 2023 at 10:04 PM Akihiko Odaki 
> >  wrote:
> >>
> >> On 2023/10/09 4:07, Willem de Bruijn wrote:
> >>> On Sun, Oct 8, 2023 at 7:22 AM Akihiko Odaki 
> >>>  wrote:
> 
>  virtio-net have two usage of hashes: one is RSS and another is 
>  hash
>  reporting. Conventionally the hash calculation was done by the 
>  VMM.
>  However, computing the hash after the queue was chosen defeats 
>  the
>  purpose of RSS.
> 
>  Another approach is to use eBPF steering program. This approach 
>  has
>  another downside: it cannot report the calculated hash due to the
>  restrictive nature of eBPF.
> 
>  Introduce the code to compute hashes to the kernel in order to 
>  overcome
>  thse challenges. An alternative solution is to extend the eBPF 
>  steering
>  program so that it will be able to report to the userspace, but 
>  it makes
>  little sense to allow to implement different hashing algorithms 
>  with
>  eBPF since the hash value reported by virtio-net is strictly 
>  defined by
>  the specification.
> 
>  The hash value already stored in sk_buff is not used and computed
>  independently since it may have been computed in a way not 
>  conformant
>  with the specification.
> 
>  Signed-off-by: Akihiko Odaki 
>  ---
> >>>
>  +static const struct tun_vnet_hash_cap tun_vnet_hash_cap = {
>  +   .max_indirection_table_length =
>  +   TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH,
>  +
>  +   .types = VIRTIO_NET_SUPPORTED_HASH_TYPES
>  +};
> >>>
> >>> No need to have explicit capabilities exchange like this? Tun 
> >>> either
> >>> supports all or none.
> >>
> >> tun does not support VIRTIO_NET_RSS_HASH_TYPE_IP_EX,
> >> VIRTIO_NET_RSS_HASH_TYPE_TCP_EX, and 
> >> VIRTIO_NET_RSS_HASH_TYPE_UDP_EX.
> >>
> >> It is because the flow dissector does not support IPv6 extensions. 
> >> The
> >> specification is also vague, and does not tell how many TLVs 
> >> should be
> >> consumed at most when interpreting destination option header so I 
> >> chose
> >> to avoid adding code for these hash types to the flow dissector. I 
> >> doubt
> >> anyone will complain about it since nobody complains for Linux.
> >>
> >> I'm also adding this so that we can extend it later.
> >> max_indirection_table_length may grow for systems with 128+ CPUs, 
> >> or
> >> types may have other bits for new protocols in the future.
> >>
> >>>
>    case TUNSETSTEERINGEBPF:
>  -   ret = tun_set_ebpf(tun, &tun->steering_prog, 
>  argp);
>  +   bpf_ret = tun_set_ebpf(tun, &tun->steering_prog, 
>  argp);
>  +   if (IS_ERR(bpf_ret))
>  +   ret = PTR_ERR(bpf_ret);
>  +   else if (bpf_ret)
>  +   tun->vnet_hash.flags &= 
>  ~TUN_VNET_HASH_RSS;
> >>>
> >>> Don't make one feature disable another.
> >>>
> >>> TUNSETSTEERINGEBPF and TUNSETVNETHASH are mutually exclusive
> >>> functions. If one is enabled the other call should fail, with 
> >>> EBUSY
> >>> for instance.
> >>>
>  +   case TUNSETVNETHASH:
>  +   len = sizeof(vnet_hash);
>  +   if (copy_from_user(&vnet_hash, argp, len)) {
>  +   ret = -EFAULT;
>  +   break;
>  +   }
>  +
>  +   if (((vnet_hash.flags & TUN_VNET_HASH_REPORT) &&
>  +(tun->vnet_hdr_sz < sizeof(struct

Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature

On Tue, Oct 10, 2023 at 1:51 PM Akihiko Odaki  wrote:
>
> On 2023/10/10 14:45, Jason Wang wrote:
> > On Tue, Oct 10, 2023 at 9:52 AM Akihiko Odaki  
> > wrote:
> >>
> >> On 2023/10/09 19:44, Willem de Bruijn wrote:
> >>> On Mon, Oct 9, 2023 at 3:12 AM Akihiko Odaki  
> >>> wrote:
> 
>  On 2023/10/09 19:06, Willem de Bruijn wrote:
> > On Mon, Oct 9, 2023 at 3:02 AM Akihiko Odaki  
> > wrote:
> >>
> >> On 2023/10/09 18:57, Willem de Bruijn wrote:
> >>> On Mon, Oct 9, 2023 at 3:57 AM Akihiko Odaki 
> >>>  wrote:
> 
>  On 2023/10/09 17:04, Willem de Bruijn wrote:
> > On Sun, Oct 8, 2023 at 3:46 PM Akihiko Odaki 
> >  wrote:
> >>
> >> On 2023/10/09 5:08, Willem de Bruijn wrote:
> >>> On Sun, Oct 8, 2023 at 10:04 PM Akihiko Odaki 
> >>>  wrote:
> 
>  On 2023/10/09 4:07, Willem de Bruijn wrote:
> > On Sun, Oct 8, 2023 at 7:22 AM Akihiko Odaki 
> >  wrote:
> >>
> >> virtio-net have two usage of hashes: one is RSS and another is 
> >> hash
> >> reporting. Conventionally the hash calculation was done by the 
> >> VMM.
> >> However, computing the hash after the queue was chosen defeats 
> >> the
> >> purpose of RSS.
> >>
> >> Another approach is to use eBPF steering program. This 
> >> approach has
> >> another downside: it cannot report the calculated hash due to 
> >> the
> >> restrictive nature of eBPF.
> >>
> >> Introduce the code to compute hashes to the kernel in order to 
> >> overcome
> >> thse challenges. An alternative solution is to extend the eBPF 
> >> steering
> >> program so that it will be able to report to the userspace, 
> >> but it makes
> >> little sense to allow to implement different hashing 
> >> algorithms with
> >> eBPF since the hash value reported by virtio-net is strictly 
> >> defined by
> >> the specification.
> >>
> >> The hash value already stored in sk_buff is not used and 
> >> computed
> >> independently since it may have been computed in a way not 
> >> conformant
> >> with the specification.
> >>
> >> Signed-off-by: Akihiko Odaki 
> >> ---
> >
> >> +static const struct tun_vnet_hash_cap tun_vnet_hash_cap = {
> >> +   .max_indirection_table_length =
> >> +   TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH,
> >> +
> >> +   .types = VIRTIO_NET_SUPPORTED_HASH_TYPES
> >> +};
> >
> > No need to have explicit capabilities exchange like this? Tun 
> > either
> > supports all or none.
> 
>  tun does not support VIRTIO_NET_RSS_HASH_TYPE_IP_EX,
>  VIRTIO_NET_RSS_HASH_TYPE_TCP_EX, and 
>  VIRTIO_NET_RSS_HASH_TYPE_UDP_EX.
> 
>  It is because the flow dissector does not support IPv6 
>  extensions. The
>  specification is also vague, and does not tell how many TLVs 
>  should be
>  consumed at most when interpreting destination option header so 
>  I chose
>  to avoid adding code for these hash types to the flow dissector. 
>  I doubt
>  anyone will complain about it since nobody complains for Linux.
> 
>  I'm also adding this so that we can extend it later.
>  max_indirection_table_length may grow for systems with 128+ 
>  CPUs, or
>  types may have other bits for new protocols in the future.
> 
> >
> >>case TUNSETSTEERINGEBPF:
> >> -   ret = tun_set_ebpf(tun, &tun->steering_prog, 
> >> argp);
> >> +   bpf_ret = tun_set_ebpf(tun, 
> >> &tun->steering_prog, argp);
> >> +   if (IS_ERR(bpf_ret))
> >> +   ret = PTR_ERR(bpf_ret);
> >> +   else if (bpf_ret)
> >> +   tun->vnet_hash.flags &= 
> >> ~TUN_VNET_HASH_RSS;
> >
> > Don't make one feature disable another.
> >
> > TUNSETSTEERINGEBPF and TUNSETVNETHASH are mutually exclusive
> > functions. If one is enabled the other call should fail, with 
> > EBUSY
> > for instance.
> >
> >> +   case TUNSETVNETHASH:
> >> +   len = sizeof(vnet_hash);
> >>

Re: [PATCH vhost v3 10/16] vdpa/mlx5: Allow creation/deletion of any given mr struct

On Mon, Oct 9, 2023 at 7:25 PM Dragos Tatulea  wrote:
>
> This patch adapts the mr creation/deletion code to be able to work with
> any given mr struct pointer. All the APIs are adapted to take an extra
> parameter for the mr.
>
> mlx5_vdpa_create/delete_mr doesn't need a ASID parameter anymore. The
> check is done in the caller instead (mlx5_set_map).
>
> This change is needed for a followup patch which will introduce an
> additional mr for the vq descriptor data.
>
> Signed-off-by: Dragos Tatulea 
> Acked-by: Eugenio Pérez 

Acked-by: Jason Wang 

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vhost v3 11/16] vdpa/mlx5: Move mr mutex out of mr struct

On Mon, Oct 9, 2023 at 7:25 PM Dragos Tatulea  wrote:
>
> The mutex is named like it is supposed to protect only the mkey but in
> reality it is a global lock for all mr resources.
>
> Shift the mutex to it's rightful location (struct mlx5_vdpa_dev) and
> give it a more appropriate name.
>
> Signed-off-by: Dragos Tatulea 
> Acked-by: Eugenio Pérez 

Acked-by: Jason Wang 

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vhost v3 12/16] vdpa/mlx5: Improve mr update flow

On Mon, Oct 9, 2023 at 7:25 PM Dragos Tatulea  wrote:
>
> The current flow for updating an mr works directly on mvdev->mr which
> makes it cumbersome to handle multiple new mr structs.
>
> This patch makes the flow more straightforward by having
> mlx5_vdpa_create_mr return a new mr which will update the old mr (if
> any). The old mr will be deleted and unlinked from mvdev.
>
> This change paves the way for adding mrs for different ASIDs.
>
> The initialized bool is no longer needed as mr is now a pointer in the
> mlx5_vdpa_dev struct which will be NULL when not initialized.
>
> Acked-by: Eugenio Pérez 
> Signed-off-by: Dragos Tatulea 
> ---

Acked-by: Jason Wang 

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vhost v3 13/16] vdpa/mlx5: Introduce mr for vq descriptor

On Mon, Oct 9, 2023 at 7:25 PM Dragos Tatulea  wrote:
>
> Introduce the vq descriptor group and mr per ASID. Until now
> .set_map on ASID 1 was only updating the cvq iotlb. From now on it also
> creates a mkey for it. The current patch doesn't use it but follow-up
> patches will add hardware support for mapping the vq descriptors.
>
> Acked-by: Eugenio Pérez 
> Signed-off-by: Dragos Tatulea 
> ---

Acked-by: Jason Wang 

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vhost v3 14/16] vdpa/mlx5: Enable hw support for vq descriptor mapping

On Mon, Oct 9, 2023 at 7:25 PM Dragos Tatulea  wrote:
>
> Vq descriptor mappings are supported in hardware by filling in an
> additional mkey which contains the descriptor mappings to the hw vq.
>
> A previous patch in this series added support for hw mkey (mr) creation
> for ASID 1.
>
> This patch fills in both the vq data and vq descriptor mkeys based on
> group ASID mapping.
>
> The feature is signaled to the vdpa core through the presence of the
> .get_vq_desc_group op.
>
> Acked-by: Eugenio Pérez 
> Signed-off-by: Dragos Tatulea 

Acked-by: Jason Wang 

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vhost v3 15/16] vdpa/mlx5: Make iotlb helper functions more generic

On Mon, Oct 9, 2023 at 7:25 PM Dragos Tatulea  wrote:
>
> They will be used in a follow-up patch.
>
> For dup_iotlb, avoid the src == dst case. This is an error.
>
> Acked-by: Eugenio Pérez 
> Signed-off-by: Dragos Tatulea 

Acked-by: Jason Wang 

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vhost v3 16/16] vdpa/mlx5: Update cvq iotlb mapping on ASID change

On Mon, Oct 9, 2023 at 7:25 PM Dragos Tatulea  wrote:
>
> For the following sequence:
> - cvq group is in ASID 0
> - .set_map(1, cvq_iotlb)
> - .set_group_asid(cvq_group, 1)
>
> ... the cvq mapping from ASID 0 will be used. This is not always correct
> behaviour.
>
> This patch adds support for the above mentioned flow by saving the iotlb
> on each .set_map and updating the cvq iotlb with it on a cvq group change.
>
> Acked-by: Eugenio Pérez 
> Signed-off-by: Dragos Tatulea 
> ---

Acked-by: Jason Wang 

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vhost v3 3/4] virtio_pci: add check for common cfg size

On Tue, Oct 10, 2023 at 11:11 AM Xuan Zhuo  wrote:
>
> Some buggy devices, the common cfg size may not match the features.
>
> This patch checks the common cfg size for the
> features(VIRTIO_F_NOTIF_CONFIG_DATA, VIRTIO_F_RING_RESET). When the
> common cfg size does not match the corresponding feature, we fail the
> probe and print error message.
>
> Signed-off-by: Xuan Zhuo 
> ---
>  drivers/virtio/virtio_pci_modern.c | 36 ++
>  drivers/virtio/virtio_pci_modern_dev.c |  2 +-
>  include/linux/virtio_pci_modern.h  |  1 +
>  3 files changed, 38 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/virtio/virtio_pci_modern.c 
> b/drivers/virtio/virtio_pci_modern.c
> index d6bb68ba84e5..6a8f5ff05636 100644
> --- a/drivers/virtio/virtio_pci_modern.c
> +++ b/drivers/virtio/virtio_pci_modern.c
> @@ -39,6 +39,39 @@ static void vp_transport_features(struct virtio_device 
> *vdev, u64 features)
> __virtio_set_bit(vdev, VIRTIO_F_RING_RESET);
>  }
>
> +static int __vp_check_common_size_one_feature(struct virtio_device *vdev, 
> u32 fbit,
> +   u32 offset, const char *fname)
> +{
> +   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> +
> +   if (!__virtio_test_bit(vdev, fbit))
> +   return 0;
> +
> +   if (likely(vp_dev->mdev.common_len >= offset))
> +   return 0;
> +
> +   dev_err(&vdev->dev,
> +   "virtio: common cfg size(%ld) does not match the feature 
> %s\n",
> +   vp_dev->mdev.common_len, fname);
> +
> +   return -EINVAL;
> +}
> +
> +#define vp_check_common_size_one_feature(vdev, fbit, field) \
> +   __vp_check_common_size_one_feature(vdev, fbit, \
> +   offsetofend(struct virtio_pci_modern_common_cfg, field), 
> #fbit)
> +
> +static int vp_check_common_size(struct virtio_device *vdev)
> +{
> +   if (vp_check_common_size_one_feature(vdev, 
> VIRTIO_F_NOTIF_CONFIG_DATA, queue_notify_data))
> +   return -EINVAL;
> +
> +   if (vp_check_common_size_one_feature(vdev, VIRTIO_F_RING_RESET, 
> queue_reset))
> +   return -EINVAL;

Do we need to at least check the offset of the queue_device as well here?

Thanks

> +
> +   return 0;
> +}
> +
>  /* virtio config->finalize_features() implementation */
>  static int vp_finalize_features(struct virtio_device *vdev)
>  {
> @@ -57,6 +90,9 @@ static int vp_finalize_features(struct virtio_device *vdev)
> return -EINVAL;
> }
>
> +   if (vp_check_common_size(vdev))
> +   return -EINVAL;
> +
> vp_modern_set_features(&vp_dev->mdev, vdev->features);
>
> return 0;
> diff --git a/drivers/virtio/virtio_pci_modern_dev.c 
> b/drivers/virtio/virtio_pci_modern_dev.c
> index 9cb601e16688..33f319da1558 100644
> --- a/drivers/virtio/virtio_pci_modern_dev.c
> +++ b/drivers/virtio/virtio_pci_modern_dev.c
> @@ -292,7 +292,7 @@ int vp_modern_probe(struct virtio_pci_modern_device *mdev)
> mdev->common = vp_modern_map_capability(mdev, common,
>   sizeof(struct virtio_pci_common_cfg), 4,
>   0, sizeof(struct 
> virtio_pci_modern_common_cfg),
> - NULL, NULL);
> + &mdev->common_len, NULL);
> if (!mdev->common)
> goto err_map_common;
> mdev->isr = vp_modern_map_capability(mdev, isr, sizeof(u8), 1,
> diff --git a/include/linux/virtio_pci_modern.h 
> b/include/linux/virtio_pci_modern.h
> index 067ac1d789bc..edf62bae0474 100644
> --- a/include/linux/virtio_pci_modern.h
> +++ b/include/linux/virtio_pci_modern.h
> @@ -28,6 +28,7 @@ struct virtio_pci_modern_device {
> /* So we can sanity-check accesses. */
> size_t notify_len;
> size_t device_len;
> +   size_t common_len;
>
> /* Capability for when we need to map notifications per-vq. */
> int notify_map_cap;
> --
> 2.32.0.3.g01195cf9f
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vhost v3 2/4] virtio_pci: fix the common cfg map size

On Tue, Oct 10, 2023 at 11:11 AM Xuan Zhuo  wrote:
>
> The function vp_modern_map_capability() takes the size parameter,
> which corresponds to the size of virtio_pci_common_cfg. As a result,
> this indicates the size of memory area to map.
>
> Now the size is the size of virtio_pci_common_cfg, but some feature(such
> as the _F_RING_RESET) needs the virtio_pci_modern_common_cfg, so this
> commit changes the size to the size of virtio_pci_modern_common_cfg.
>
> Fixes: 0b50cece0b78 ("virtio_pci: introduce helper to get/set queue reset")
> Signed-off-by: Xuan Zhuo 

Acked-by: Jason Wang 

Thanks

> ---
>  drivers/virtio/virtio_pci_modern_dev.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/virtio/virtio_pci_modern_dev.c 
> b/drivers/virtio/virtio_pci_modern_dev.c
> index aad7d9296e77..9cb601e16688 100644
> --- a/drivers/virtio/virtio_pci_modern_dev.c
> +++ b/drivers/virtio/virtio_pci_modern_dev.c
> @@ -291,7 +291,7 @@ int vp_modern_probe(struct virtio_pci_modern_device *mdev)
> err = -EINVAL;
> mdev->common = vp_modern_map_capability(mdev, common,
>   sizeof(struct virtio_pci_common_cfg), 4,
> - 0, sizeof(struct virtio_pci_common_cfg),
> + 0, sizeof(struct 
> virtio_pci_modern_common_cfg),
>   NULL, NULL);
> if (!mdev->common)
> goto err_map_common;
> --
> 2.32.0.3.g01195cf9f
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vhost v3 1/4] virtio: add definition of VIRTIO_F_NOTIF_CONFIG_DATA feature bit

On Tue, Oct 10, 2023 at 11:11 AM Xuan Zhuo  wrote:
>
> This patch adds the definition of VIRTIO_F_NOTIF_CONFIG_DATA feature bit
> in the relevant header file.
>
> This feature indicates that the driver uses the data provided by the
> device as a virtqueue identifier in available buffer notifications.
>
> It comes from here:
> https://github.com/oasis-tcs/virtio-spec/issues/89
>
> Signed-off-by: Xuan Zhuo 

Acked-by: Jason Wang 

Thanks

> ---
>  include/uapi/linux/virtio_config.h | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/include/uapi/linux/virtio_config.h 
> b/include/uapi/linux/virtio_config.h
> index 2c712c654165..8881aea60f6f 100644
> --- a/include/uapi/linux/virtio_config.h
> +++ b/include/uapi/linux/virtio_config.h
> @@ -105,6 +105,11 @@
>   */
>  #define VIRTIO_F_NOTIFICATION_DATA 38
>
> +/* This feature indicates that the driver uses the data provided by the 
> device
> + * as a virtqueue identifier in available buffer notifications.
> + */
> +#define VIRTIO_F_NOTIF_CONFIG_DATA 39
> +
>  /*
>   * This feature indicates that the driver can reset a queue individually.
>   */
> --
> 2.32.0.3.g01195cf9f
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 2/2] virtio-mmio: Support multiple interrupts per device