On Mon, Feb 03, 2025 at 03:20:50PM -0500, Peter Xu wrote: > On Mon, Feb 03, 2025 at 07:53:00PM +0100, Maciej S. Szmigiero wrote: > > On 3.02.2025 19:20, Peter Xu wrote: > > > On Thu, Jan 30, 2025 at 11:08:29AM +0100, Maciej S. Szmigiero wrote: > > > > From: "Maciej S. Szmigiero" <maciej.szmigi...@oracle.com> > > > > > > > > Multifd send channels are terminated by calling > > > > qio_channel_shutdown(QIO_CHANNEL_SHUTDOWN_BOTH) in > > > > multifd_send_terminate_threads(), which in the TLS case essentially > > > > calls shutdown(SHUT_RDWR) on the underlying raw socket. > > > > > > > > Unfortunately, this does not terminate the TLS session properly and > > > > the receive side sees this as a GNUTLS_E_PREMATURE_TERMINATION error. > > > > > > > > The only reason why this wasn't causing migration failures is because > > > > the current migration code apparently does not check for migration > > > > error being set after the end of the multifd receive process. > > > > > > > > However, this will change soon so the multifd receive code has to be > > > > prepared to not return an error on such premature TLS session EOF. > > > > Use the newly introduced QIOChannelTLS method for that. > > > > > > > > It's worth noting that even if the sender were to be changed to > > > > terminate > > > > the TLS connection properly the receive side still needs to remain > > > > compatible with older QEMU bit stream which does not do this. > > > > > > If this is an existing bug, we could add a Fixes. > > > > It is an existing issue but only uncovered by this patch set. > > > > As far as I can see it was always there, so it would need some > > thought where to point that Fixes tag. > > If there's no way to trigger a real functional bug anyway, it's also ok we > omit the Fixes. > > > > Two pure questions.. > > > > > > - What is the correct way to terminate the TLS session without this > > > flag? > > > > I guess one would need to call gnutls_bye() like in this GnuTLS example: > > https://gitlab.com/gnutls/gnutls/-/blob/2b8c3e4c71ad380bbbffb32e6003b34ecad596e3/doc/examples/ex-client-anon.c#L102 > > > > > - Why this is only needed by multifd sessions? > > > > What uncovered the issue was switching the load threads to using > > migrate_set_error() instead of their own result variable > > (load_threads_ret) which you had requested during the previous > > patch set version review: > > https://lore.kernel.org/qemu-devel/Z1DbH5fwBaxtgrvH@x1n/ > > > > Turns out that the multifd receive code always returned > > error in the TLS case, just nothing was previously checking for > > that error presence. > > What I was curious is whether this issue also exists for the main migration > channel when with tls, especially when e.g. multifd not enabled at all. As > I don't see anywhere that qemu uses gnutls_bye() for any tls session.
We've been lazy and avoided using gnutls_bye because it adds a bunch more complexity to the shutdown sequence, and premature shutdown from an malicious attack would be expected to cause the QEMU protocol in the TLS channel to fail anyway. > Does it mean we need proper gnutls_bye() somewhere? Depends if the protocol we run over TLS can identify premature termination itself, or whether it relies on TLS to identify it. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|