On Mon, Feb 03, 2025 at 03:20:50PM -0500, Peter Xu wrote:
> On Mon, Feb 03, 2025 at 07:53:00PM +0100, Maciej S. Szmigiero wrote:
> > On 3.02.2025 19:20, Peter Xu wrote:
> > > On Thu, Jan 30, 2025 at 11:08:29AM +0100, Maciej S. Szmigiero wrote:
> > > > From: "Maciej S. Szmigiero" <maciej.szmigi...@oracle.com>
> > > > 
> > > > Multifd send channels are terminated by calling
> > > > qio_channel_shutdown(QIO_CHANNEL_SHUTDOWN_BOTH) in
> > > > multifd_send_terminate_threads(), which in the TLS case essentially
> > > > calls shutdown(SHUT_RDWR) on the underlying raw socket.
> > > > 
> > > > Unfortunately, this does not terminate the TLS session properly and
> > > > the receive side sees this as a GNUTLS_E_PREMATURE_TERMINATION error.
> > > > 
> > > > The only reason why this wasn't causing migration failures is because
> > > > the current migration code apparently does not check for migration
> > > > error being set after the end of the multifd receive process.
> > > > 
> > > > However, this will change soon so the multifd receive code has to be
> > > > prepared to not return an error on such premature TLS session EOF.
> > > > Use the newly introduced QIOChannelTLS method for that.
> > > > 
> > > > It's worth noting that even if the sender were to be changed to 
> > > > terminate
> > > > the TLS connection properly the receive side still needs to remain
> > > > compatible with older QEMU bit stream which does not do this.
> > > 
> > > If this is an existing bug, we could add a Fixes.
> > 
> > It is an existing issue but only uncovered by this patch set.
> > 
> > As far as I can see it was always there, so it would need some
> > thought where to point that Fixes tag.
> 
> If there's no way to trigger a real functional bug anyway, it's also ok we
> omit the Fixes.
> 
> > > Two pure questions..
> > > 
> > >    - What is the correct way to terminate the TLS session without this 
> > > flag?
> > 
> > I guess one would need to call gnutls_bye() like in this GnuTLS example:
> > https://gitlab.com/gnutls/gnutls/-/blob/2b8c3e4c71ad380bbbffb32e6003b34ecad596e3/doc/examples/ex-client-anon.c#L102
> > 
> > >    - Why this is only needed by multifd sessions?
> > 
> > What uncovered the issue was switching the load threads to using
> > migrate_set_error() instead of their own result variable
> > (load_threads_ret) which you had requested during the previous
> > patch set version review:
> > https://lore.kernel.org/qemu-devel/Z1DbH5fwBaxtgrvH@x1n/
> > 
> > Turns out that the multifd receive code always returned
> > error in the TLS case, just nothing was previously checking for
> > that error presence.
> 
> What I was curious is whether this issue also exists for the main migration
> channel when with tls, especially when e.g. multifd not enabled at all.  As
> I don't see anywhere that qemu uses gnutls_bye() for any tls session.

We've been lazy and avoided using gnutls_bye because it adds a
bunch more complexity to the shutdown sequence, and premature
shutdown from an malicious attack would be expected to cause
the QEMU protocol in the TLS channel to fail anyway.

> Does it mean we need proper gnutls_bye() somewhere?

Depends if the protocol we run over TLS can identify premature
termination itself, or whether it relies on TLS to identify
it.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Reply via email to