On Tue, Feb 04, 2025 at 04:12:15PM +0000, Daniel P. Berrangé wrote: > On Tue, Feb 04, 2025 at 11:02:28AM -0500, Peter Xu wrote: > > On Tue, Feb 04, 2025 at 03:08:02PM +0000, Daniel P. Berrangé wrote: > > > On Mon, Feb 03, 2025 at 01:20:01PM -0500, Peter Xu wrote: > > > > On Thu, Jan 30, 2025 at 11:08:29AM +0100, Maciej S. Szmigiero wrote: > > > > > From: "Maciej S. Szmigiero" <maciej.szmigi...@oracle.com> > > > > > > > > > > Multifd send channels are terminated by calling > > > > > qio_channel_shutdown(QIO_CHANNEL_SHUTDOWN_BOTH) in > > > > > multifd_send_terminate_threads(), which in the TLS case essentially > > > > > calls shutdown(SHUT_RDWR) on the underlying raw socket. > > > > > > > > > > Unfortunately, this does not terminate the TLS session properly and > > > > > the receive side sees this as a GNUTLS_E_PREMATURE_TERMINATION error. > > > > > > > > > > The only reason why this wasn't causing migration failures is because > > > > > the current migration code apparently does not check for migration > > > > > error being set after the end of the multifd receive process. > > > > > > > > > > However, this will change soon so the multifd receive code has to be > > > > > prepared to not return an error on such premature TLS session EOF. > > > > > Use the newly introduced QIOChannelTLS method for that. > > > > > > > > > > It's worth noting that even if the sender were to be changed to > > > > > terminate > > > > > the TLS connection properly the receive side still needs to remain > > > > > compatible with older QEMU bit stream which does not do this. > > > > > > > > If this is an existing bug, we could add a Fixes. > > > > > > > > Two pure questions.. > > > > > > > > - What is the correct way to terminate the TLS session without this > > > > flag? > > > > > > > > - Why this is only needed by multifd sessions? > > > > > > Graceful TLS termination (via gnutls_bye()) should only be important to > > > security if the QEMU protocol in question does not know how much data it > > > is expecting to recieve. ie it cannot otherwise distinguish between an > > > expected EOF, and a premature EOF triggered by an attacker. > > > > > > If the migration protocol has sufficient info to know when a chanel is > > > expected to see EOF, then we should stop trying to read from the TLS > > > channel before seeing the underlying EOF. > > > > > > Ignoring GNUTLS_E_PREMATURE_TERMINATION would be valid if we know that > > > migration will still fail corretly in the case of a malicious attack > > > causing premature termination. > > > > > > If there's a risk that migration may succeed, but with incomplete data, > > > then we would need the full gnutls_bye dance. > > > > IIUC that's not required for migration then, because migration should know > > exactly how much data to receive, and migration should need to verify that > > and fail if the received data didn't match the expectation along the way. > > We also have QEMU_VM_EOF as the end mark of stream. > > > > Said that, are we sure any pre-mature termination will only happen after > > all data read in the receive buffer that was sent? > > > > To ask in another way: what happens if the source QEMU sends everything and > > shutdown()/close() the channel, meanwhile the dest QEMU sees both (1) rest > > data to read, and (2) a pre-mature terminatino of TLS session in a read() > > syscall. Would (2) be reported even before (1), or the order guaranteed > > that read of the residue data in (1) always happen before (2) (considering > > dest QEMU can be slow sometime on consuming the network buffers)? > > That's not logically possible. > > In both (1) and (2) you are issuing a read() call to the TLS channel. > > The first read call(s) consume all incoming data. Only once the underlying > TCP socket read() returns 0, would GNUTLS see that it hasn't got any > TLS "bye" packet, and thus return GNUTLS_E_PREMATURE_TERMINATION from > the layered TLS read(). IOW, if you see GNUTLS_E_PREMATURE_TERMINATION > you know you have already read all received data off the socket.
That looks all OK then. In that case we could set all migration TLS sessions to ignore premature terminations. Thanks, -- Peter Xu