Re: [PATCH net-next v4 2/5] virtio_net: enable irq for the control vq

2024-06-26 Thread Jiri Pirko
Thu, Jun 20, 2024 at 12:31:34PM CEST, hen...@linux.alibaba.com wrote:
>On Thu, 20 Jun 2024 06:11:40 -0400, "Michael S. Tsirkin"  
>wrote:
>> On Thu, Jun 20, 2024 at 06:10:51AM -0400, Michael S. Tsirkin wrote:
>> > On Thu, Jun 20, 2024 at 05:53:15PM +0800, Heng Qi wrote:
>> > > On Thu, 20 Jun 2024 16:26:05 +0800, Jason Wang  
>> > > wrote:
>> > > > On Thu, Jun 20, 2024 at 4:21 PM Jason Wang  wrote:
>> > > > >
>> > > > > On Thu, Jun 20, 2024 at 3:35 PM Heng Qi  
>> > > > > wrote:
>> > > > > >
>> > > > > > On Wed, 19 Jun 2024 17:19:12 -0400, "Michael S. Tsirkin" 
>> > > > > >  wrote:
>> > > > > > > On Thu, Jun 20, 2024 at 12:19:05AM +0800, Heng Qi wrote:
>> > > > > > > > @@ -5312,7 +5315,7 @@ static int virtnet_find_vqs(struct 
>> > > > > > > > virtnet_info *vi)
>> > > > > > > >
>> > > > > > > > /* Parameters for control virtqueue, if any */
>> > > > > > > > if (vi->has_cvq) {
>> > > > > > > > -   callbacks[total_vqs - 1] = NULL;
>> > > > > > > > +   callbacks[total_vqs - 1] = virtnet_cvq_done;
>> > > > > > > > names[total_vqs - 1] = "control";
>> > > > > > > > }
>> > > > > > > >
>> > > > > > >
>> > > > > > > If the # of MSIX vectors is exactly for data path VQs,
>> > > > > > > this will cause irq sharing between VQs which will degrade
>> > > > > > > performance significantly.
>> > > > > > >
>> > > > >
>> > > > > Why do we need to care about buggy management? I think libvirt has
>> > > > > been teached to use 2N+2 since the introduction of the multiqueue[1].
>> > > > 
>> > > > And Qemu can calculate it correctly automatically since:
>> > > > 
>> > > > commit 51a81a2118df0c70988f00d61647da9e298483a4
>> > > > Author: Jason Wang 
>> > > > Date:   Mon Mar 8 12:49:19 2021 +0800
>> > > > 
>> > > > virtio-net: calculating proper msix vectors on init
>> > > > 
>> > > > Currently, the default msix vectors for virtio-net-pci is 3 which 
>> > > > is
>> > > > obvious not suitable for multiqueue guest, so we depends on the 
>> > > > user
>> > > > or management tools to pass a correct vectors parameter. In fact, 
>> > > > we
>> > > > can simplifying this by calculating the number of vectors on 
>> > > > realize.
>> > > > 
>> > > > Consider we have N queues, the number of vectors needed is 2*N + 2
>> > > > (#queue pairs + plus one config interrupt and control vq). We 
>> > > > didn't
>> > > > check whether or not host support control vq because it was added
>> > > > unconditionally by qemu to avoid breaking legacy guests such as 
>> > > > Minix.
>> > > > 
>> > > > Reviewed-by: Philippe Mathieu-Daudé > > > > Reviewed-by: Stefano Garzarella 
>> > > > Reviewed-by: Stefan Hajnoczi 
>> > > > Signed-off-by: Jason Wang 
>> > > 
>> > > Yes, devices designed according to the spec need to reserve an interrupt
>> > > vector for ctrlq. So, Michael, do we want to be compatible with buggy 
>> > > devices?
>> > > 
>> > > Thanks.
>> > 
>> > These aren't buggy, the spec allows this. So don't fail, but
>> > I'm fine with using polling if not enough vectors.
>> 
>> sharing with config interrupt is easier code-wise though, FWIW -
>> we don't need to maintain two code-paths.
>
>Yes, it works well - config change irq is used less before - and will not fail.

Please note I'm working on such fallback for admin queue. I would Like
to send the patchset by the end of this week. You can then use it easily
for cvq.

Something like:
/* the config->find_vqs() implementation */
int vp_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
struct virtqueue *vqs[], vq_callback_t *callbacks[],
const char * const names[], const bool *ctx,
struct irq_affinity *desc)
{
int err;

/* Try MSI-X with one vector per queue. */
err = vp_find_vqs_msix(vdev, nvqs, vqs, callbacks, names,
   VP_VQ_VECTOR_POLICY_EACH, ctx, desc);
if (!err)
return 0;
/* Fallback: MSI-X with one shared vector for config and
 * slow path queues, one vector per queue for the rest. */
err = vp_find_vqs_msix(vdev, nvqs, vqs, callbacks, names,
   VP_VQ_VECTOR_POLICY_SHARED_SLOW, ctx, desc);
if (!err)
return 0;
/* Fallback: MSI-X with one vector for config, one shared for queues. */
err = vp_find_vqs_msix(vdev, nvqs, vqs, callbacks, names,
   VP_VQ_VECTOR_POLICY_SHARED, ctx, desc);
if (!err)
return 0;
/* Is there an interrupt? If not give up. */
if (!(to_vp_device(vdev)->pci_dev->irq))
return err;
/* Finally fall back to regular interrupts. */
return vp_find_vqs_intx(vdev, nvqs, vqs, callbacks, names, ctx);
}




>
>Thanks.
>
>> 
>> > > > 
>> > > > Thanks
>> > > > 
>> > > > >
>> > > > > > > So no, you can not just do it unconditionally.
>> > > > > > >
>> > > > > > > The correct fix probably requi

Re: [PATCH net-next v4 2/5] virtio_net: enable irq for the control vq

2024-06-26 Thread Michael S. Tsirkin
On Wed, Jun 26, 2024 at 09:52:58AM +0200, Jiri Pirko wrote:
> Thu, Jun 20, 2024 at 12:31:34PM CEST, hen...@linux.alibaba.com wrote:
> >On Thu, 20 Jun 2024 06:11:40 -0400, "Michael S. Tsirkin"  
> >wrote:
> >> On Thu, Jun 20, 2024 at 06:10:51AM -0400, Michael S. Tsirkin wrote:
> >> > On Thu, Jun 20, 2024 at 05:53:15PM +0800, Heng Qi wrote:
> >> > > On Thu, 20 Jun 2024 16:26:05 +0800, Jason Wang  
> >> > > wrote:
> >> > > > On Thu, Jun 20, 2024 at 4:21 PM Jason Wang  
> >> > > > wrote:
> >> > > > >
> >> > > > > On Thu, Jun 20, 2024 at 3:35 PM Heng Qi  
> >> > > > > wrote:
> >> > > > > >
> >> > > > > > On Wed, 19 Jun 2024 17:19:12 -0400, "Michael S. Tsirkin" 
> >> > > > > >  wrote:
> >> > > > > > > On Thu, Jun 20, 2024 at 12:19:05AM +0800, Heng Qi wrote:
> >> > > > > > > > @@ -5312,7 +5315,7 @@ static int virtnet_find_vqs(struct 
> >> > > > > > > > virtnet_info *vi)
> >> > > > > > > >
> >> > > > > > > > /* Parameters for control virtqueue, if any */
> >> > > > > > > > if (vi->has_cvq) {
> >> > > > > > > > -   callbacks[total_vqs - 1] = NULL;
> >> > > > > > > > +   callbacks[total_vqs - 1] = virtnet_cvq_done;
> >> > > > > > > > names[total_vqs - 1] = "control";
> >> > > > > > > > }
> >> > > > > > > >
> >> > > > > > >
> >> > > > > > > If the # of MSIX vectors is exactly for data path VQs,
> >> > > > > > > this will cause irq sharing between VQs which will degrade
> >> > > > > > > performance significantly.
> >> > > > > > >
> >> > > > >
> >> > > > > Why do we need to care about buggy management? I think libvirt has
> >> > > > > been teached to use 2N+2 since the introduction of the 
> >> > > > > multiqueue[1].
> >> > > > 
> >> > > > And Qemu can calculate it correctly automatically since:
> >> > > > 
> >> > > > commit 51a81a2118df0c70988f00d61647da9e298483a4
> >> > > > Author: Jason Wang 
> >> > > > Date:   Mon Mar 8 12:49:19 2021 +0800
> >> > > > 
> >> > > > virtio-net: calculating proper msix vectors on init
> >> > > > 
> >> > > > Currently, the default msix vectors for virtio-net-pci is 3 
> >> > > > which is
> >> > > > obvious not suitable for multiqueue guest, so we depends on the 
> >> > > > user
> >> > > > or management tools to pass a correct vectors parameter. In 
> >> > > > fact, we
> >> > > > can simplifying this by calculating the number of vectors on 
> >> > > > realize.
> >> > > > 
> >> > > > Consider we have N queues, the number of vectors needed is 2*N + 
> >> > > > 2
> >> > > > (#queue pairs + plus one config interrupt and control vq). We 
> >> > > > didn't
> >> > > > check whether or not host support control vq because it was added
> >> > > > unconditionally by qemu to avoid breaking legacy guests such as 
> >> > > > Minix.
> >> > > > 
> >> > > > Reviewed-by: Philippe Mathieu-Daudé  >> > > > Reviewed-by: Stefano Garzarella 
> >> > > > Reviewed-by: Stefan Hajnoczi 
> >> > > > Signed-off-by: Jason Wang 
> >> > > 
> >> > > Yes, devices designed according to the spec need to reserve an 
> >> > > interrupt
> >> > > vector for ctrlq. So, Michael, do we want to be compatible with buggy 
> >> > > devices?
> >> > > 
> >> > > Thanks.
> >> > 
> >> > These aren't buggy, the spec allows this. So don't fail, but
> >> > I'm fine with using polling if not enough vectors.
> >> 
> >> sharing with config interrupt is easier code-wise though, FWIW -
> >> we don't need to maintain two code-paths.
> >
> >Yes, it works well - config change irq is used less before - and will not 
> >fail.
> 
> Please note I'm working on such fallback for admin queue. I would Like
> to send the patchset by the end of this week. You can then use it easily
> for cvq.
> 
> Something like:
> /* the config->find_vqs() implementation */
> int vp_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
> struct virtqueue *vqs[], vq_callback_t *callbacks[],
> const char * const names[], const bool *ctx,
> struct irq_affinity *desc)
> {
> int err;
> 
> /* Try MSI-X with one vector per queue. */
> err = vp_find_vqs_msix(vdev, nvqs, vqs, callbacks, names,
>VP_VQ_VECTOR_POLICY_EACH, ctx, desc);
> if (!err)
> return 0;
> /* Fallback: MSI-X with one shared vector for config and
>  * slow path queues, one vector per queue for the rest. */
> err = vp_find_vqs_msix(vdev, nvqs, vqs, callbacks, names,
>VP_VQ_VECTOR_POLICY_SHARED_SLOW, ctx, desc);
> if (!err)
> return 0;
> /* Fallback: MSI-X with one vector for config, one shared for queues. 
> */
> err = vp_find_vqs_msix(vdev, nvqs, vqs, callbacks, names,
>VP_VQ_VECTOR_POLICY_SHARED, ctx, desc);
> if (!err)
> return 0;
> /* Is there an interrupt? If not give up. */
> if (!(to_vp_device(vdev)->pci_dev->irq))
>

Re: How to implement message forwarding from one CID to another in vhost driver

2024-06-26 Thread Stefano Garzarella

Hi Dorjoy,

On Tue, Jun 25, 2024 at 11:44:30PM GMT, Dorjoy Chowdhury wrote:

Hey Stefano,


[...]


>
>So the immediate plan would be to:
>
>  1) Build a new vhost-vsock-forward object model that connects to
>vhost as CID 3 and then forwards every packet from CID 1 to the
>Enclave-CID and every packet that arrives on to CID 3 to CID 2.

This though requires writing completely from scratch the virtio-vsock
emulation in QEMU. If you have time that would be great, otherwise if
you want to do a PoC, my advice is to start with vhost-user-vsock which
is already there.



Can you give me some more details about how I can implement the
daemon? 


We already have a demon written in Rust, so I don't recommend you 
rewrite one from scratch, just start with that. You can find the daemon 
and instructions on how to use it with QEMU here [1].



I would appreciate some pointers to code too.


I sent the pointer to it in my first reply [2].



Right now, the "nitro-enclave" machine type (wip) in QEMU
automatically spawns a VHOST_VSOCK device with the CID equal to the
"guest-cid" machine option. I think this is equivalent to using the
"-device vhost-vsock-device,guest-cid=N" option explicitly. Does that
need any change? I guess instead of "vhost-vsock-device", the
vhost-vsock device needs to be equivalent to "-device
vhost-user-vsock-device,guest-cid=N"?


Nope, the vhost-user-vsock device requires just a `chardev` option.
The chardev points to the Unix socket used by QEMU to talk with the 
daemon. The daemon has a parameter to set the CID. See [1] for the 
examples.




The applications inside the nitro-enclave VM will still connect and
talk to CID 3. So on the daemon side, do we need to spawn a device
that has CID 3 and then forward everything this device receives to CID
1 (VMADDR_CID_LOCAL) same port and everything it receives from CID 1
to the "guest-cid"? 


Yep, I think this is right.
Note: to use VMADDR_CID_LOCAL, the host needs to load `vsock_loopback` 
kernel module.


Before modifying the code, if you want to do some testing, perhaps you 
can use socat (which supports both UNIX-* and VSOCK-*). The daemon for 
now exposes two unix sockets, one is used to communicate with QEMU via 
the vhost-user protocol, and the other is to be used by the application 
to communicate with vsock sockets in the guest using the hybrid protocol 
defined by firecracker. So you could initiate a socat between the latter 
and VMADDR_CID_LOCAL, the only problem I see is that you have to send 
the first string provided by the hybrid protocol (CONNECT 1234), but for 
a PoC it should be ok.


I just tried the following and it works without touching any code:

shell1$ ./target/debug/vhost-device-vsock \
--vm guest-cid=3,socket=/tmp/vhost3.socket,uds-path=/tmp/vm3.vsock

shell2$ sudo modprobe vsock_loopback
shell2$ socat VSOCK-LISTEN:1234 UNIX-CONNECT:/tmp/vm3.vsock

shell3$ qemu-system-x86_64 -smp 2 -M q35,accel=kvm,memory-backend=mem \
-drive file=fedora40.qcow2,format=qcow2,if=virtio\
-chardev socket,id=char0,path=/tmp/vhost3.socket \
-device vhost-user-vsock-pci,chardev=char0 \
-object memory-backend-memfd,id=mem,size=512M \
-nographic

guest$ nc --vsock -l 1234

shell4$ nc --vsock 1 1234
CONNECT 1234

Note: the `CONNECT 1234` is required by the hybrid vsock protocol 
defined by firecracker, so if we extend the vhost-device-vsock 
daemon to forward packet to VMADDR_CID_LOCAL, that would not be 
needed (including running socat).



This is just an example for how to use loopback, now if from the VM you 
want to connect to a CID other than 2, then we have to modify the daemon 
to do that.



The applications that will be running in the host
need to be changed so that instead of connecting to the "guest-cid" of
the nitro-enclave VM, they will instead connect to VMADDR_CID_LOCAL.
Is my understanding correct?


Yep.



BTW is there anything related to the "VMADDR_FLAG_TO_HOST" flag that
needs to be checked? I remember some discussion about it.


No, that flag is handled by the driver. If that flag is on, the driver 
forwards the packet to the host, regardless of the destination CID. So 
it has to be set by the application in the guest, but it should already 
do that since that flag was introduced just for Nitro enclaves.




It would be great if you could give me some details about how I can
achieve the CID 3 <-> CID 2 communication using the vhost-user-vsock.


CID 3 <-> CID 2 is the standard use case, right?
The readme in [1] contains several examples, let me know if you need 
more details ;-)



Is this https://github.com/stefano-garzarella/vhost-user-vsock where I
would need to add support for forwarding everything to
VMADDR_CID_LOCAL via an option maybe?


Nope, that one was a PoC and the repo is archived, the daemon is [1].
BTW, I agree on the option for the forwarding.

Thanks,
Stefano

[1] 
https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock
[2] 
https://lore.kernel.org/virt

Re: [PATCH net-next v4 2/5] virtio_net: enable irq for the control vq

2024-06-26 Thread Jiri Pirko
Wed, Jun 26, 2024 at 10:08:14AM CEST, m...@redhat.com wrote:
>On Wed, Jun 26, 2024 at 09:52:58AM +0200, Jiri Pirko wrote:
>> Thu, Jun 20, 2024 at 12:31:34PM CEST, hen...@linux.alibaba.com wrote:
>> >On Thu, 20 Jun 2024 06:11:40 -0400, "Michael S. Tsirkin"  
>> >wrote:
>> >> On Thu, Jun 20, 2024 at 06:10:51AM -0400, Michael S. Tsirkin wrote:
>> >> > On Thu, Jun 20, 2024 at 05:53:15PM +0800, Heng Qi wrote:
>> >> > > On Thu, 20 Jun 2024 16:26:05 +0800, Jason Wang  
>> >> > > wrote:
>> >> > > > On Thu, Jun 20, 2024 at 4:21 PM Jason Wang  
>> >> > > > wrote:
>> >> > > > >
>> >> > > > > On Thu, Jun 20, 2024 at 3:35 PM Heng Qi 
>> >> > > > >  wrote:
>> >> > > > > >
>> >> > > > > > On Wed, 19 Jun 2024 17:19:12 -0400, "Michael S. Tsirkin" 
>> >> > > > > >  wrote:
>> >> > > > > > > On Thu, Jun 20, 2024 at 12:19:05AM +0800, Heng Qi wrote:
>> >> > > > > > > > @@ -5312,7 +5315,7 @@ static int virtnet_find_vqs(struct 
>> >> > > > > > > > virtnet_info *vi)
>> >> > > > > > > >
>> >> > > > > > > > /* Parameters for control virtqueue, if any */
>> >> > > > > > > > if (vi->has_cvq) {
>> >> > > > > > > > -   callbacks[total_vqs - 1] = NULL;
>> >> > > > > > > > +   callbacks[total_vqs - 1] = virtnet_cvq_done;
>> >> > > > > > > > names[total_vqs - 1] = "control";
>> >> > > > > > > > }
>> >> > > > > > > >
>> >> > > > > > >
>> >> > > > > > > If the # of MSIX vectors is exactly for data path VQs,
>> >> > > > > > > this will cause irq sharing between VQs which will degrade
>> >> > > > > > > performance significantly.
>> >> > > > > > >
>> >> > > > >
>> >> > > > > Why do we need to care about buggy management? I think libvirt has
>> >> > > > > been teached to use 2N+2 since the introduction of the 
>> >> > > > > multiqueue[1].
>> >> > > > 
>> >> > > > And Qemu can calculate it correctly automatically since:
>> >> > > > 
>> >> > > > commit 51a81a2118df0c70988f00d61647da9e298483a4
>> >> > > > Author: Jason Wang 
>> >> > > > Date:   Mon Mar 8 12:49:19 2021 +0800
>> >> > > > 
>> >> > > > virtio-net: calculating proper msix vectors on init
>> >> > > > 
>> >> > > > Currently, the default msix vectors for virtio-net-pci is 3 
>> >> > > > which is
>> >> > > > obvious not suitable for multiqueue guest, so we depends on the 
>> >> > > > user
>> >> > > > or management tools to pass a correct vectors parameter. In 
>> >> > > > fact, we
>> >> > > > can simplifying this by calculating the number of vectors on 
>> >> > > > realize.
>> >> > > > 
>> >> > > > Consider we have N queues, the number of vectors needed is 2*N 
>> >> > > > + 2
>> >> > > > (#queue pairs + plus one config interrupt and control vq). We 
>> >> > > > didn't
>> >> > > > check whether or not host support control vq because it was 
>> >> > > > added
>> >> > > > unconditionally by qemu to avoid breaking legacy guests such as 
>> >> > > > Minix.
>> >> > > > 
>> >> > > > Reviewed-by: Philippe Mathieu-Daudé > >> > > > Reviewed-by: Stefano Garzarella 
>> >> > > > Reviewed-by: Stefan Hajnoczi 
>> >> > > > Signed-off-by: Jason Wang 
>> >> > > 
>> >> > > Yes, devices designed according to the spec need to reserve an 
>> >> > > interrupt
>> >> > > vector for ctrlq. So, Michael, do we want to be compatible with buggy 
>> >> > > devices?
>> >> > > 
>> >> > > Thanks.
>> >> > 
>> >> > These aren't buggy, the spec allows this. So don't fail, but
>> >> > I'm fine with using polling if not enough vectors.
>> >> 
>> >> sharing with config interrupt is easier code-wise though, FWIW -
>> >> we don't need to maintain two code-paths.
>> >
>> >Yes, it works well - config change irq is used less before - and will not 
>> >fail.
>> 
>> Please note I'm working on such fallback for admin queue. I would Like
>> to send the patchset by the end of this week. You can then use it easily
>> for cvq.
>> 
>> Something like:
>> /* the config->find_vqs() implementation */
>> int vp_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
>> struct virtqueue *vqs[], vq_callback_t *callbacks[],
>> const char * const names[], const bool *ctx,
>> struct irq_affinity *desc)
>> {
>> int err;
>> 
>> /* Try MSI-X with one vector per queue. */
>> err = vp_find_vqs_msix(vdev, nvqs, vqs, callbacks, names,
>>VP_VQ_VECTOR_POLICY_EACH, ctx, desc);
>> if (!err)
>> return 0;
>> /* Fallback: MSI-X with one shared vector for config and
>>  * slow path queues, one vector per queue for the rest. */
>> err = vp_find_vqs_msix(vdev, nvqs, vqs, callbacks, names,
>>VP_VQ_VECTOR_POLICY_SHARED_SLOW, ctx, desc);
>> if (!err)
>> return 0;
>> /* Fallback: MSI-X with one vector for config, one shared for 
>> queues. */
>> err = vp_find_vqs_msix(vdev, nvqs, vqs, callbacks, names,
>>VP_VQ_VECTOR_POLICY_SHARED

Re: [PATCH net-next v4 2/5] virtio_net: enable irq for the control vq

2024-06-26 Thread Michael S. Tsirkin
On Wed, Jun 26, 2024 at 10:43:11AM +0200, Jiri Pirko wrote:
> Wed, Jun 26, 2024 at 10:08:14AM CEST, m...@redhat.com wrote:
> >On Wed, Jun 26, 2024 at 09:52:58AM +0200, Jiri Pirko wrote:
> >> Thu, Jun 20, 2024 at 12:31:34PM CEST, hen...@linux.alibaba.com wrote:
> >> >On Thu, 20 Jun 2024 06:11:40 -0400, "Michael S. Tsirkin" 
> >> > wrote:
> >> >> On Thu, Jun 20, 2024 at 06:10:51AM -0400, Michael S. Tsirkin wrote:
> >> >> > On Thu, Jun 20, 2024 at 05:53:15PM +0800, Heng Qi wrote:
> >> >> > > On Thu, 20 Jun 2024 16:26:05 +0800, Jason Wang 
> >> >> > >  wrote:
> >> >> > > > On Thu, Jun 20, 2024 at 4:21 PM Jason Wang  
> >> >> > > > wrote:
> >> >> > > > >
> >> >> > > > > On Thu, Jun 20, 2024 at 3:35 PM Heng Qi 
> >> >> > > > >  wrote:
> >> >> > > > > >
> >> >> > > > > > On Wed, 19 Jun 2024 17:19:12 -0400, "Michael S. Tsirkin" 
> >> >> > > > > >  wrote:
> >> >> > > > > > > On Thu, Jun 20, 2024 at 12:19:05AM +0800, Heng Qi wrote:
> >> >> > > > > > > > @@ -5312,7 +5315,7 @@ static int virtnet_find_vqs(struct 
> >> >> > > > > > > > virtnet_info *vi)
> >> >> > > > > > > >
> >> >> > > > > > > > /* Parameters for control virtqueue, if any */
> >> >> > > > > > > > if (vi->has_cvq) {
> >> >> > > > > > > > -   callbacks[total_vqs - 1] = NULL;
> >> >> > > > > > > > +   callbacks[total_vqs - 1] = virtnet_cvq_done;
> >> >> > > > > > > > names[total_vqs - 1] = "control";
> >> >> > > > > > > > }
> >> >> > > > > > > >
> >> >> > > > > > >
> >> >> > > > > > > If the # of MSIX vectors is exactly for data path VQs,
> >> >> > > > > > > this will cause irq sharing between VQs which will degrade
> >> >> > > > > > > performance significantly.
> >> >> > > > > > >
> >> >> > > > >
> >> >> > > > > Why do we need to care about buggy management? I think libvirt 
> >> >> > > > > has
> >> >> > > > > been teached to use 2N+2 since the introduction of the 
> >> >> > > > > multiqueue[1].
> >> >> > > > 
> >> >> > > > And Qemu can calculate it correctly automatically since:
> >> >> > > > 
> >> >> > > > commit 51a81a2118df0c70988f00d61647da9e298483a4
> >> >> > > > Author: Jason Wang 
> >> >> > > > Date:   Mon Mar 8 12:49:19 2021 +0800
> >> >> > > > 
> >> >> > > > virtio-net: calculating proper msix vectors on init
> >> >> > > > 
> >> >> > > > Currently, the default msix vectors for virtio-net-pci is 3 
> >> >> > > > which is
> >> >> > > > obvious not suitable for multiqueue guest, so we depends on 
> >> >> > > > the user
> >> >> > > > or management tools to pass a correct vectors parameter. In 
> >> >> > > > fact, we
> >> >> > > > can simplifying this by calculating the number of vectors on 
> >> >> > > > realize.
> >> >> > > > 
> >> >> > > > Consider we have N queues, the number of vectors needed is 
> >> >> > > > 2*N + 2
> >> >> > > > (#queue pairs + plus one config interrupt and control vq). We 
> >> >> > > > didn't
> >> >> > > > check whether or not host support control vq because it was 
> >> >> > > > added
> >> >> > > > unconditionally by qemu to avoid breaking legacy guests such 
> >> >> > > > as Minix.
> >> >> > > > 
> >> >> > > > Reviewed-by: Philippe Mathieu-Daudé  >> >> > > > Reviewed-by: Stefano Garzarella 
> >> >> > > > Reviewed-by: Stefan Hajnoczi 
> >> >> > > > Signed-off-by: Jason Wang 
> >> >> > > 
> >> >> > > Yes, devices designed according to the spec need to reserve an 
> >> >> > > interrupt
> >> >> > > vector for ctrlq. So, Michael, do we want to be compatible with 
> >> >> > > buggy devices?
> >> >> > > 
> >> >> > > Thanks.
> >> >> > 
> >> >> > These aren't buggy, the spec allows this. So don't fail, but
> >> >> > I'm fine with using polling if not enough vectors.
> >> >> 
> >> >> sharing with config interrupt is easier code-wise though, FWIW -
> >> >> we don't need to maintain two code-paths.
> >> >
> >> >Yes, it works well - config change irq is used less before - and will not 
> >> >fail.
> >> 
> >> Please note I'm working on such fallback for admin queue. I would Like
> >> to send the patchset by the end of this week. You can then use it easily
> >> for cvq.
> >> 
> >> Something like:
> >> /* the config->find_vqs() implementation */
> >> int vp_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
> >> struct virtqueue *vqs[], vq_callback_t *callbacks[],
> >> const char * const names[], const bool *ctx,
> >> struct irq_affinity *desc)
> >> {
> >> int err;
> >> 
> >> /* Try MSI-X with one vector per queue. */
> >> err = vp_find_vqs_msix(vdev, nvqs, vqs, callbacks, names,
> >>VP_VQ_VECTOR_POLICY_EACH, ctx, desc);
> >> if (!err)
> >> return 0;
> >> /* Fallback: MSI-X with one shared vector for config and
> >>  * slow path queues, one vector per queue for the rest. */
> >> err = vp_find_vqs_msix(vdev, nvqs, vqs, callbacks, names,
> >>VP_VQ_VECTOR_POLICY_S

Re: [PATCH net-next v4 2/5] virtio_net: enable irq for the control vq

2024-06-26 Thread Jiri Pirko
Wed, Jun 26, 2024 at 11:58:08AM CEST, m...@redhat.com wrote:
>On Wed, Jun 26, 2024 at 10:43:11AM +0200, Jiri Pirko wrote:
>> Wed, Jun 26, 2024 at 10:08:14AM CEST, m...@redhat.com wrote:
>> >On Wed, Jun 26, 2024 at 09:52:58AM +0200, Jiri Pirko wrote:
>> >> Thu, Jun 20, 2024 at 12:31:34PM CEST, hen...@linux.alibaba.com wrote:
>> >> >On Thu, 20 Jun 2024 06:11:40 -0400, "Michael S. Tsirkin" 
>> >> > wrote:
>> >> >> On Thu, Jun 20, 2024 at 06:10:51AM -0400, Michael S. Tsirkin wrote:
>> >> >> > On Thu, Jun 20, 2024 at 05:53:15PM +0800, Heng Qi wrote:
>> >> >> > > On Thu, 20 Jun 2024 16:26:05 +0800, Jason Wang 
>> >> >> > >  wrote:
>> >> >> > > > On Thu, Jun 20, 2024 at 4:21 PM Jason Wang  
>> >> >> > > > wrote:
>> >> >> > > > >
>> >> >> > > > > On Thu, Jun 20, 2024 at 3:35 PM Heng Qi 
>> >> >> > > > >  wrote:
>> >> >> > > > > >
>> >> >> > > > > > On Wed, 19 Jun 2024 17:19:12 -0400, "Michael S. Tsirkin" 
>> >> >> > > > > >  wrote:
>> >> >> > > > > > > On Thu, Jun 20, 2024 at 12:19:05AM +0800, Heng Qi wrote:
>> >> >> > > > > > > > @@ -5312,7 +5315,7 @@ static int virtnet_find_vqs(struct 
>> >> >> > > > > > > > virtnet_info *vi)
>> >> >> > > > > > > >
>> >> >> > > > > > > > /* Parameters for control virtqueue, if any */
>> >> >> > > > > > > > if (vi->has_cvq) {
>> >> >> > > > > > > > -   callbacks[total_vqs - 1] = NULL;
>> >> >> > > > > > > > +   callbacks[total_vqs - 1] = virtnet_cvq_done;
>> >> >> > > > > > > > names[total_vqs - 1] = "control";
>> >> >> > > > > > > > }
>> >> >> > > > > > > >
>> >> >> > > > > > >
>> >> >> > > > > > > If the # of MSIX vectors is exactly for data path VQs,
>> >> >> > > > > > > this will cause irq sharing between VQs which will degrade
>> >> >> > > > > > > performance significantly.
>> >> >> > > > > > >
>> >> >> > > > >
>> >> >> > > > > Why do we need to care about buggy management? I think libvirt 
>> >> >> > > > > has
>> >> >> > > > > been teached to use 2N+2 since the introduction of the 
>> >> >> > > > > multiqueue[1].
>> >> >> > > > 
>> >> >> > > > And Qemu can calculate it correctly automatically since:
>> >> >> > > > 
>> >> >> > > > commit 51a81a2118df0c70988f00d61647da9e298483a4
>> >> >> > > > Author: Jason Wang 
>> >> >> > > > Date:   Mon Mar 8 12:49:19 2021 +0800
>> >> >> > > > 
>> >> >> > > > virtio-net: calculating proper msix vectors on init
>> >> >> > > > 
>> >> >> > > > Currently, the default msix vectors for virtio-net-pci is 3 
>> >> >> > > > which is
>> >> >> > > > obvious not suitable for multiqueue guest, so we depends on 
>> >> >> > > > the user
>> >> >> > > > or management tools to pass a correct vectors parameter. In 
>> >> >> > > > fact, we
>> >> >> > > > can simplifying this by calculating the number of vectors on 
>> >> >> > > > realize.
>> >> >> > > > 
>> >> >> > > > Consider we have N queues, the number of vectors needed is 
>> >> >> > > > 2*N + 2
>> >> >> > > > (#queue pairs + plus one config interrupt and control vq). 
>> >> >> > > > We didn't
>> >> >> > > > check whether or not host support control vq because it was 
>> >> >> > > > added
>> >> >> > > > unconditionally by qemu to avoid breaking legacy guests such 
>> >> >> > > > as Minix.
>> >> >> > > > 
>> >> >> > > > Reviewed-by: Philippe Mathieu-Daudé > >> >> > > > Reviewed-by: Stefano Garzarella 
>> >> >> > > > Reviewed-by: Stefan Hajnoczi 
>> >> >> > > > Signed-off-by: Jason Wang 
>> >> >> > > 
>> >> >> > > Yes, devices designed according to the spec need to reserve an 
>> >> >> > > interrupt
>> >> >> > > vector for ctrlq. So, Michael, do we want to be compatible with 
>> >> >> > > buggy devices?
>> >> >> > > 
>> >> >> > > Thanks.
>> >> >> > 
>> >> >> > These aren't buggy, the spec allows this. So don't fail, but
>> >> >> > I'm fine with using polling if not enough vectors.
>> >> >> 
>> >> >> sharing with config interrupt is easier code-wise though, FWIW -
>> >> >> we don't need to maintain two code-paths.
>> >> >
>> >> >Yes, it works well - config change irq is used less before - and will 
>> >> >not fail.
>> >> 
>> >> Please note I'm working on such fallback for admin queue. I would Like
>> >> to send the patchset by the end of this week. You can then use it easily
>> >> for cvq.
>> >> 
>> >> Something like:
>> >> /* the config->find_vqs() implementation */
>> >> int vp_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
>> >> struct virtqueue *vqs[], vq_callback_t *callbacks[],
>> >> const char * const names[], const bool *ctx,
>> >> struct irq_affinity *desc)
>> >> {
>> >> int err;
>> >> 
>> >> /* Try MSI-X with one vector per queue. */
>> >> err = vp_find_vqs_msix(vdev, nvqs, vqs, callbacks, names,
>> >>VP_VQ_VECTOR_POLICY_EACH, ctx, desc);
>> >> if (!err)
>> >> return 0;
>> >> /* Fallback: MSI-X with one shared vector for config and
>> >>  * slow path queues

Re: How to implement message forwarding from one CID to another in vhost driver

2024-06-26 Thread Dorjoy Chowdhury
Hey Stefano,
Thanks a lot for all the details. I will look into them and reach out
if I need further input. Thanks! I have tried to summarize my
understanding below. Let me know if that sounds correct.

On Wed, Jun 26, 2024 at 2:37 PM Stefano Garzarella  wrote:
>
> Hi Dorjoy,
>
> On Tue, Jun 25, 2024 at 11:44:30PM GMT, Dorjoy Chowdhury wrote:
> >Hey Stefano,
>
> [...]
>
> >> >
> >> >So the immediate plan would be to:
> >> >
> >> >  1) Build a new vhost-vsock-forward object model that connects to
> >> >vhost as CID 3 and then forwards every packet from CID 1 to the
> >> >Enclave-CID and every packet that arrives on to CID 3 to CID 2.
> >>
> >> This though requires writing completely from scratch the virtio-vsock
> >> emulation in QEMU. If you have time that would be great, otherwise if
> >> you want to do a PoC, my advice is to start with vhost-user-vsock which
> >> is already there.
> >>
> >
> >Can you give me some more details about how I can implement the
> >daemon?
>
> We already have a demon written in Rust, so I don't recommend you
> rewrite one from scratch, just start with that. You can find the daemon
> and instructions on how to use it with QEMU here [1].
>
> >I would appreciate some pointers to code too.
>
> I sent the pointer to it in my first reply [2].
>
> >
> >Right now, the "nitro-enclave" machine type (wip) in QEMU
> >automatically spawns a VHOST_VSOCK device with the CID equal to the
> >"guest-cid" machine option. I think this is equivalent to using the
> >"-device vhost-vsock-device,guest-cid=N" option explicitly. Does that
> >need any change? I guess instead of "vhost-vsock-device", the
> >vhost-vsock device needs to be equivalent to "-device
> >vhost-user-vsock-device,guest-cid=N"?
>
> Nope, the vhost-user-vsock device requires just a `chardev` option.
> The chardev points to the Unix socket used by QEMU to talk with the
> daemon. The daemon has a parameter to set the CID. See [1] for the
> examples.
>
> >
> >The applications inside the nitro-enclave VM will still connect and
> >talk to CID 3. So on the daemon side, do we need to spawn a device
> >that has CID 3 and then forward everything this device receives to CID
> >1 (VMADDR_CID_LOCAL) same port and everything it receives from CID 1
> >to the "guest-cid"?
>
> Yep, I think this is right.
> Note: to use VMADDR_CID_LOCAL, the host needs to load `vsock_loopback`
> kernel module.
>
> Before modifying the code, if you want to do some testing, perhaps you
> can use socat (which supports both UNIX-* and VSOCK-*). The daemon for
> now exposes two unix sockets, one is used to communicate with QEMU via
> the vhost-user protocol, and the other is to be used by the application
> to communicate with vsock sockets in the guest using the hybrid protocol
> defined by firecracker. So you could initiate a socat between the latter
> and VMADDR_CID_LOCAL, the only problem I see is that you have to send
> the first string provided by the hybrid protocol (CONNECT 1234), but for
> a PoC it should be ok.
>
> I just tried the following and it works without touching any code:
>
> shell1$ ./target/debug/vhost-device-vsock \
>  --vm guest-cid=3,socket=/tmp/vhost3.socket,uds-path=/tmp/vm3.vsock
>
> shell2$ sudo modprobe vsock_loopback
> shell2$ socat VSOCK-LISTEN:1234 UNIX-CONNECT:/tmp/vm3.vsock
>
> shell3$ qemu-system-x86_64 -smp 2 -M q35,accel=kvm,memory-backend=mem \
>  -drive file=fedora40.qcow2,format=qcow2,if=virtio\
>  -chardev socket,id=char0,path=/tmp/vhost3.socket \
>  -device vhost-user-vsock-pci,chardev=char0 \
>  -object memory-backend-memfd,id=mem,size=512M \
>  -nographic
>
>  guest$ nc --vsock -l 1234
>
> shell4$ nc --vsock 1 1234
> CONNECT 1234
>
>  Note: the `CONNECT 1234` is required by the hybrid vsock protocol
>  defined by firecracker, so if we extend the vhost-device-vsock
>  daemon to forward packet to VMADDR_CID_LOCAL, that would not be
>  needed (including running socat).
>

Understood. Just trying to think out loud what the final UX will be
from the user perspective to successfully run a nitro VM before I try
to modify vhost-device-vsock to support forwarding to
VMADDR_CID_LOCAL.
I guess because the "vhost-user-vsock" device needs to be spawned
implicitly (without any explicit option) inside nitro-enclave in QEMU,
we now need to provide the "chardev" as a machine option, so the
nitro-enclave command would look something like below:
"./qemu-system-x86_64 -M nitro-enclave,chardev=char0 -kernel
/path/to/eif -chardev socket,id=char0,path=/tmp/vhost5.socket -m 4G
--enable-kvm -cpu host"
and then set the chardev id to the vhost-user-vsock device in the code
from the machine option.

The modified "vhost-device-vsock" would need to be run with the new
option that will forward everything to VMADDR_CID_LOCAL (below by the
"-z" I mean the new option)
"./target/debug/vhost-device-vsock -z --vm
guest-cid=5,socket=/tmp/vhost5.socket,uds-path=/tmp/vm5.vsock"
this means the guest-cid of the nitro VM is CID