Re: [PATCH net-next v4 2/5] virtio_net: enable irq for the control vq

2024-06-25 Thread Michael S. Tsirkin
On Tue, Jun 25, 2024 at 09:27:24AM +0800, Jason Wang wrote:
> On Thu, Jun 20, 2024 at 6:12 PM Michael S. Tsirkin  wrote:
> >
> > On Thu, Jun 20, 2024 at 06:10:51AM -0400, Michael S. Tsirkin wrote:
> > > On Thu, Jun 20, 2024 at 05:53:15PM +0800, Heng Qi wrote:
> > > > On Thu, 20 Jun 2024 16:26:05 +0800, Jason Wang  
> > > > wrote:
> > > > > On Thu, Jun 20, 2024 at 4:21 PM Jason Wang  
> > > > > wrote:
> > > > > >
> > > > > > On Thu, Jun 20, 2024 at 3:35 PM Heng Qi  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Wed, 19 Jun 2024 17:19:12 -0400, "Michael S. Tsirkin" 
> > > > > > >  wrote:
> > > > > > > > On Thu, Jun 20, 2024 at 12:19:05AM +0800, Heng Qi wrote:
> > > > > > > > > @@ -5312,7 +5315,7 @@ static int virtnet_find_vqs(struct 
> > > > > > > > > virtnet_info *vi)
> > > > > > > > >
> > > > > > > > > /* Parameters for control virtqueue, if any */
> > > > > > > > > if (vi->has_cvq) {
> > > > > > > > > -   callbacks[total_vqs - 1] = NULL;
> > > > > > > > > +   callbacks[total_vqs - 1] = virtnet_cvq_done;
> > > > > > > > > names[total_vqs - 1] = "control";
> > > > > > > > > }
> > > > > > > > >
> > > > > > > >
> > > > > > > > If the # of MSIX vectors is exactly for data path VQs,
> > > > > > > > this will cause irq sharing between VQs which will degrade
> > > > > > > > performance significantly.
> > > > > > > >
> > > > > >
> > > > > > Why do we need to care about buggy management? I think libvirt has
> > > > > > been teached to use 2N+2 since the introduction of the 
> > > > > > multiqueue[1].
> > > > >
> > > > > And Qemu can calculate it correctly automatically since:
> > > > >
> > > > > commit 51a81a2118df0c70988f00d61647da9e298483a4
> > > > > Author: Jason Wang 
> > > > > Date:   Mon Mar 8 12:49:19 2021 +0800
> > > > >
> > > > > virtio-net: calculating proper msix vectors on init
> > > > >
> > > > > Currently, the default msix vectors for virtio-net-pci is 3 which 
> > > > > is
> > > > > obvious not suitable for multiqueue guest, so we depends on the 
> > > > > user
> > > > > or management tools to pass a correct vectors parameter. In fact, 
> > > > > we
> > > > > can simplifying this by calculating the number of vectors on 
> > > > > realize.
> > > > >
> > > > > Consider we have N queues, the number of vectors needed is 2*N + 2
> > > > > (#queue pairs + plus one config interrupt and control vq). We 
> > > > > didn't
> > > > > check whether or not host support control vq because it was added
> > > > > unconditionally by qemu to avoid breaking legacy guests such as 
> > > > > Minix.
> > > > >
> > > > > Reviewed-by: Philippe Mathieu-Daudé  > > > > Reviewed-by: Stefano Garzarella 
> > > > > Reviewed-by: Stefan Hajnoczi 
> > > > > Signed-off-by: Jason Wang 
> > > >
> > > > Yes, devices designed according to the spec need to reserve an interrupt
> > > > vector for ctrlq. So, Michael, do we want to be compatible with buggy 
> > > > devices?
> > > >
> > > > Thanks.
> > >
> > > These aren't buggy, the spec allows this.
> 
> So it doesn't differ from the case when we are lacking sufficient msix
> vectors in the case of multiqueue. In that case we just fallback to
> share one msix for all queues and another for config and we don't
> bother at that time.

sharing queues is exactly "bothering".

> Any reason to bother now?
> 
> Thanks

This patch can make datapath slower for such configs by switching to
sharing msix for a control path benefit. Not a tradeoff many users want
to make.


> > >  So don't fail, but
> > > I'm fine with using polling if not enough vectors.
> >
> > sharing with config interrupt is easier code-wise though, FWIW -
> > we don't need to maintain two code-paths.
> >
> > > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > > > > So no, you can not just do it unconditionally.
> > > > > > > >
> > > > > > > > The correct fix probably requires virtio core/API extensions.
> > > > > > >
> > > > > > > If the introduction of cvq irq causes interrupts to become 
> > > > > > > shared, then
> > > > > > > ctrlq need to fall back to polling mode and keep the status quo.
> > > > > >
> > > > > > Having to path sounds a burden.
> > > > > >
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > >
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > [1] https://www.linux-kvm.org/page/Multiqueue
> > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > MST
> > > > > > > >
> > > > > > >
> > > > >
> >




Re: How to implement message forwarding from one CID to another in vhost driver

2024-06-25 Thread Dorjoy Chowdhury
Hey Stefano,

On Wed, May 29, 2024 at 4:56 PM Stefano Garzarella  wrote:
>
> On Wed, May 29, 2024 at 12:43:57PM GMT, Alexander Graf wrote:
> >
> >On 29.05.24 10:04, Stefano Garzarella wrote:
> >>
> >>On Tue, May 28, 2024 at 06:38:24PM GMT, Paolo Bonzini wrote:
> >>>On Tue, May 28, 2024 at 5:53 PM Stefano Garzarella
> >>> wrote:
> 
> On Tue, May 28, 2024 at 05:49:32PM GMT, Paolo Bonzini wrote:
> >On Tue, May 28, 2024 at 5:41 PM Stefano Garzarella
>  wrote:
> >> >I think it's either that or implementing virtio-vsock in userspace
> >> >(https://lore.kernel.org/qemu-devel/30baeb56-64d2-4ea3-8e53-6a5c50999...@redhat.com/,
> >> >search for "To connect host<->guest").
> >>
> >> For in this case AF_VSOCK can't be used in the host, right?
> >> So it's similar to vhost-user-vsock.
> >
> >Not sure if I understand but in this case QEMU knows which CIDs are
> >forwarded to the host (either listen on vsock and connect to the host,
> >or vice versa), so there is no kernel and no VMADDR_FLAG_TO_HOST
> >involved.
> 
> I meant that the application in the host that wants to connect to the
> guest cannot use AF_VSOCK in the host, but must use the one where QEMU
> is listening (e.g. AF_INET, AF_UNIX), right?
> 
> I think one of Alex's requirements was that the application in the host
> continue to use AF_VSOCK as in their environment.
> >>>
> >>>Can the host use VMADDR_CID_LOCAL for host-to-host communication?
> >>
> >>Yep!
> >>
> >>>If
> >>>so, the proposed "-object vsock-forward" syntax can connect to it and
> >>>it should work as long as the application on the host does not assume
> >>>that it is on CID 3.
> >>
> >>Right, good point!
> >>We can also support something similar in vhost-user-vsock, where instead
> >>of using AF_UNIX and firecracker's hybrid vsock, we can redirect
> >>everything to VMADDR_CID_LOCAL.
> >>
> >>Alex what do you think? That would simplify things a lot to do.
> >>The only difference is that the application in the host has to talk to
> >>VMADDR_CID_LOCAL (1).
> >
> >
> >The application in the host would see an incoming connection from CID
> >1 (which is probably fine) and would still be able to establish
> >outgoing connections to the actual VM's CID as long as the Enclave
> >doesn't check for the peer CID (I haven't seen anyone check yet). So
> >yes, indeed, this should work.
> >
> >The only case where I can see it breaking is when you run multiple
> >Enclave VMs in parallel. In that case, each would try to listen to CID
> >3 and the second that does would fail. But it's a well solvable
> >problem: We could (in addition to the simple in-QEMU case) build an
> >external daemon that does the proxying and hence owns CID3.
>
> Well, we can modify vhost-user-vsock for that. It's already a daemon,
> already supports different VMs per single daemon but as of now they have
> to have different CIDs.
>
> >
> >So the immediate plan would be to:
> >
> >  1) Build a new vhost-vsock-forward object model that connects to
> >vhost as CID 3 and then forwards every packet from CID 1 to the
> >Enclave-CID and every packet that arrives on to CID 3 to CID 2.
>
> This though requires writing completely from scratch the virtio-vsock
> emulation in QEMU. If you have time that would be great, otherwise if
> you want to do a PoC, my advice is to start with vhost-user-vsock which
> is already there.
>

Can you give me some more details about how I can implement the
daemon? I would appreciate some pointers to code too.

Right now, the "nitro-enclave" machine type (wip) in QEMU
automatically spawns a VHOST_VSOCK device with the CID equal to the
"guest-cid" machine option. I think this is equivalent to using the
"-device vhost-vsock-device,guest-cid=N" option explicitly. Does that
need any change? I guess instead of "vhost-vsock-device", the
vhost-vsock device needs to be equivalent to "-device
vhost-user-vsock-device,guest-cid=N"?

The applications inside the nitro-enclave VM will still connect and
talk to CID 3. So on the daemon side, do we need to spawn a device
that has CID 3 and then forward everything this device receives to CID
1 (VMADDR_CID_LOCAL) same port and everything it receives from CID 1
to the "guest-cid"? The applications that will be running in the host
need to be changed so that instead of connecting to the "guest-cid" of
the nitro-enclave VM, they will instead connect to VMADDR_CID_LOCAL.
Is my understanding correct?

BTW is there anything related to the "VMADDR_FLAG_TO_HOST" flag that
needs to be checked? I remember some discussion about it.

It would be great if you could give me some details about how I can
achieve the CID 3 <-> CID 2 communication using the vhost-user-vsock.
Is this https://github.com/stefano-garzarella/vhost-user-vsock where I
would need to add support for forwarding everything to
VMADDR_CID_LOCAL via an option maybe?

Thanks and Regards,
Dorjoy