"Maciej S. Szmigiero" <m...@maciej.szmigiero.name> writes: > On 6.02.2025 15:13, Fabiano Rosas wrote: >> "Maciej S. Szmigiero" <m...@maciej.szmigiero.name> writes: >> >>> On 5.02.2025 21:42, Fabiano Rosas wrote: >>>> Fabiano Rosas <faro...@suse.de> writes: >>>> >>>>> Daniel P. Berrangé <berra...@redhat.com> writes: >>>>> >>>>>> On Tue, Feb 04, 2025 at 10:31:31AM -0500, Peter Xu wrote: >>>>>>> On Tue, Feb 04, 2025 at 03:39:00PM +0100, Maciej S. Szmigiero wrote: >>>>>>>> On 3.02.2025 23:56, Peter Xu wrote: >>>>>>>>> On Mon, Feb 03, 2025 at 10:41:32PM +0100, Maciej S. Szmigiero wrote: >>>>>>>>>> On 3.02.2025 21:20, Peter Xu wrote: >>>>>>>>>>> On Mon, Feb 03, 2025 at 07:53:00PM +0100, Maciej S. Szmigiero wrote: >>>>>>>>>>>> On 3.02.2025 19:20, Peter Xu wrote: >>>>>>>>>>>>> On Thu, Jan 30, 2025 at 11:08:29AM +0100, Maciej S. Szmigiero >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> From: "Maciej S. Szmigiero" <maciej.szmigi...@oracle.com> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Multifd send channels are terminated by calling >>>>>>>>>>>>>> qio_channel_shutdown(QIO_CHANNEL_SHUTDOWN_BOTH) in >>>>>>>>>>>>>> multifd_send_terminate_threads(), which in the TLS case >>>>>>>>>>>>>> essentially >>>>>>>>>>>>>> calls shutdown(SHUT_RDWR) on the underlying raw socket. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Unfortunately, this does not terminate the TLS session properly >>>>>>>>>>>>>> and >>>>>>>>>>>>>> the receive side sees this as a GNUTLS_E_PREMATURE_TERMINATION >>>>>>>>>>>>>> error. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The only reason why this wasn't causing migration failures is >>>>>>>>>>>>>> because >>>>>>>>>>>>>> the current migration code apparently does not check for >>>>>>>>>>>>>> migration >>>>>>>>>>>>>> error being set after the end of the multifd receive process. >>>>>>>>>>>>>> >>>>>>>>>>>>>> However, this will change soon so the multifd receive code has >>>>>>>>>>>>>> to be >>>>>>>>>>>>>> prepared to not return an error on such premature TLS session >>>>>>>>>>>>>> EOF. >>>>>>>>>>>>>> Use the newly introduced QIOChannelTLS method for that. >>>>>>>>>>>>>> >>>>>>>>>>>>>> It's worth noting that even if the sender were to be changed to >>>>>>>>>>>>>> terminate >>>>>>>>>>>>>> the TLS connection properly the receive side still needs to >>>>>>>>>>>>>> remain >>>>>>>>>>>>>> compatible with older QEMU bit stream which does not do this. >>>>>>>>>>>>> >>>>>>>>>>>>> If this is an existing bug, we could add a Fixes. >>>>>>>>>>>> >>>>>>>>>>>> It is an existing issue but only uncovered by this patch set. >>>>>>>>>>>> >>>>>>>>>>>> As far as I can see it was always there, so it would need some >>>>>>>>>>>> thought where to point that Fixes tag. >>>>>>>>>>> >>>>>>>>>>> If there's no way to trigger a real functional bug anyway, it's >>>>>>>>>>> also ok we >>>>>>>>>>> omit the Fixes. >>>>>>>>>>> >>>>>>>>>>>>> Two pure questions.. >>>>>>>>>>>>> >>>>>>>>>>>>> - What is the correct way to terminate the TLS session >>>>>>>>>>>>> without this flag? >>>>>>>>>>>> >>>>>>>>>>>> I guess one would need to call gnutls_bye() like in this GnuTLS >>>>>>>>>>>> example: >>>>>>>>>>>> https://gitlab.com/gnutls/gnutls/-/blob/2b8c3e4c71ad380bbbffb32e6003b34ecad596e3/doc/examples/ex-client-anon.c#L102 >>>>>>>>>>>> >>>>>>>>>>>>> - Why this is only needed by multifd sessions? >>>>>>>>>>>> >>>>>>>>>>>> What uncovered the issue was switching the load threads to using >>>>>>>>>>>> migrate_set_error() instead of their own result variable >>>>>>>>>>>> (load_threads_ret) which you had requested during the previous >>>>>>>>>>>> patch set version review: >>>>>>>>>>>> https://lore.kernel.org/qemu-devel/Z1DbH5fwBaxtgrvH@x1n/ >>>>>>>>>>>> >>>>>>>>>>>> Turns out that the multifd receive code always returned >>>>>>>>>>>> error in the TLS case, just nothing was previously checking for >>>>>>>>>>>> that error presence. >>>>>>>>>>> >>>>>>>>>>> What I was curious is whether this issue also exists for the main >>>>>>>>>>> migration >>>>>>>>>>> channel when with tls, especially when e.g. multifd not enabled at >>>>>>>>>>> all. As >>>>>>>>>>> I don't see anywhere that qemu uses gnutls_bye() for any tls >>>>>>>>>>> session. >>>>>>>>>>> >>>>>>>>>>> I think it's a good to find that we overlooked this before.. and >>>>>>>>>>> IMHO it's >>>>>>>>>>> always good we could fix this. >>>>>>>>>>> >>>>>>>>>>> Does it mean we need proper gnutls_bye() somewhere? >>>>>>>>>>> >>>>>>>>>>> If we need an explicit gnutls_bye(), then I wonder if that should >>>>>>>>>>> be done >>>>>>>>>>> on the main channel as well. >>>>>>>>>> >>>>>>>>>> That's a good question and looking at the code >>>>>>>>>> qemu_loadvm_state_main() exits >>>>>>>>>> on receiving "QEMU_VM_EOF" section (that's different from receiving >>>>>>>>>> socket EOF) >>>>>>>>>> and then optionally "QEMU_VM_VMDESCRIPTION" section is read with >>>>>>>>>> explicit size >>>>>>>>>> in qemu_loadvm_state() - so still not until channel EOF. >>>>>>>>> >>>>>>>>> I had a closer look, I do feel like such pre-mature termination is >>>>>>>>> caused >>>>>>>>> by explicit shutdown()s of the iochannels, looks like that can cause >>>>>>>>> issue >>>>>>>>> even after everything is sent. Then I noticed indeed multifd sender >>>>>>>>> iochannels will get explicit shutdown()s since commit 077fbb5942, >>>>>>>>> while we >>>>>>>>> don't do that for the main channel. Maybe that is a major difference. >>>>>>>>> >>>>>>>>> Now I wonder whether we should shutdown() the channel at all if >>>>>>>>> migration >>>>>>>>> succeeded, because looks like it can cause tls session to interrupt >>>>>>>>> even if >>>>>>>>> the shutdown() is done after sent everything, and if so it'll explain >>>>>>>>> why >>>>>>>>> you hit the issue with tls. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Then I can't see anything else reading the channel until it is >>>>>>>>>> closed in >>>>>>>>>> migration_incoming_state_destroy(). >>>>>>>>>> >>>>>>>>>> So most likely the main migration channel will never read far enough >>>>>>>>>> to >>>>>>>>>> reach that GNUTLS_E_PREMATURE_TERMINATION error. >>>>>>>>>> >>>>>>>>>>> If we don't need gnutls_bye(), then should we always ignore >>>>>>>>>>> pre-mature >>>>>>>>>>> termination of tls no matter if it's multifd or non-multifd channel >>>>>>>>>>> (or >>>>>>>>>>> even a tls session that is not migration-related)? >>>>>>>>>> >>>>>>>>>> So basically have this patch extended to calling >>>>>>>>>> qio_channel_tls_set_premature_eof_okay() also on the main migration >>>>>>>>>> channel? >>>>>>>>> >>>>>>>>> If above theory can stand, then eof-okay could be a workaround >>>>>>>>> papering >>>>>>>>> over the real problem that we shouldn't always shutdown().. >>>>>>>>> >>>>>>>>> Could you have a look at below patch and see whether it can fix the >>>>>>>>> problem >>>>>>>>> you hit too, in replace of these two patches (including the previous >>>>>>>>> iochannel change)? >>>>>>>>> >>>>>>>> >>>>>>>> Unfortunately, the patch below does not fix the problem: >>>>>>>>> qemu-system-x86_64: Cannot read from TLS channel: The TLS connection >>>>>>>>> was non-properly terminated. >>>>>>>>> qemu-system-x86_64: Cannot read from TLS channel: The TLS connection >>>>>>>>> was non-properly terminated. >>>>>>>> >>>>>>>> I think that, even in the absence of shutdown(), if the sender does not >>>>>>>> call gnutls_bye() the TLS session is considered improperly terminated. >>>>>>> >>>>>>> Ah.. >>>>>>> >>>>>>> How about one more change on top of above change to disconnect properly >>>>>>> for >>>>>>> TLS? Something like gnutls_bye() in qio_channel_tls_close(), would that >>>>>>> make sense to you? >>>>>> >>>>>> Calling gnutls_bye from qio_channel_tls_close is not viable for the >>>>>> API contract of qio_channel_close. gnutls_bye needs to be able to >>>>>> perform I/O, which means we need to be able to tell the caller >>>>>> whether it needs to perform an event loop wait for POLLIN or POLLOUT. >>>>>> >>>>>> This is the same API design scenario as the gnutls_handshake method. >>>>>> As such I tdon't think it is practical to abstract it inside any >>>>>> existing QIOChannel API call, it'll have to be standalone like >>>>>> qio_channel_tls_handshake() is. >>>>>> >>>>> >>>>> I implemented the call to gnutls_bye: >>>>> https://gitlab.com/farosas/qemu/-/commits/migration-tls-bye >>>>> >>>>> Then while testing it I realised we actually have a regression from 9.2: >>>>> >>>>> 1d457daf86 ("migration/multifd: Further remove the SYNC on complete") >>>>> >>>>> It seems that patch somehow affected the ordering between src shutdown >>>>> vs. recv shutdown and now the recv channels are staying around to see >>>>> the connection being broken. Or something... I'm still looking into it. >>>>> >>>> >>>> Ok, so the issue is that the recv side would previously be stuck at the >>>> sync semaphore and multifd_recv_terminate_threads() would kick it only >>>> after 'exiting' was set, so no further recv() would happen. >>>> >>>> After the patch, there's no final sync anymore, so the recv thread loops >>>> around and waits at the recv() until multifd_send_terminate_threads() >>>> closes the connection. >>>> >>>> Waiting on sem_sync as before would lead to a cleaner termination >>>> process IMO, but I don't think it's worth the extra complexity of >>>> introducing a sync to the device state migration. >>>> >>>> So I think we'll have to go with one of the approaches suggested on this >>>> thread (gnutls_bye or premature_ok). I'm fine either way, but let's make >>>> sure we add a reference to the patch above and some words explaining the >>>> situation. >>> >>> We still need premature_ok for handling older QEMU versions that do not >>> terminate the TLS stream correctly since the TLS test regression happens >>> even without device state transfer being enabled. >> >> What exactly is the impact of this issue to the device state series? >> From the cover letter: >> >> * qemu_loadvm_load_thread_pool now reports error via migrate_set_error() >> instead of dedicated load_threads_ret variable. >> >> * Since the change above uncovered an issue with respect to multifd send >> channels not terminating TLS session properly QIOChannelTLS now allows >> gracefully handling this situation. >> >> I understand qemu_loadvm_load_thread_pool() is attempting to use >> migrate_set_error() but an error is already set by the recv_thread. Is >> that the issue? > > Yes, when we test for load threads error in the TLS case we see that multifd > TLS > one. > We need to know whether the load threads succeeded so we either continue with > the > migration or abort it. > >> I wonder if we could isolate somehow this so it doesn't >> impact this series. > > The previous version simply used a dedicated load_threads_ret variable > so it wasn't affected but Peter likes migrate_set_error() more for > migration thread pools. > >>> >>> So I think that's what we should use generally. >>> >> >> For premature_ok, we need to make sure it will not hang QEMU if the >> connection gets unexpectedly closed. The current code checks for >> shutdown() having already happened, which is fine because it means we're >> already on the way out. However, if any ol' recv() can now ignore a >> premature termination error, then the recv_thread will not trigger >> cleanup of the multifd_recv threads. > > Enabling premature_ok just turns GNUTLS_E_PREMATURE_TERMINATION error > on EOF into a normal EOF. > > The receive thread will exit on either one: >> if (ret == 0 || ret == -1) { /* 0: EOF -1: Error */ >> break; >> } > > It's true that multifd_recv_terminate_threads() will only be called > by multifd_recv_cleanup() or multifd_recv_shutdown() in this case, > however this is already the case for non-TLS migration. >
My point is that a premature termination sets local_err and a premature_ok==true doesn't. So it's not the same as non-TLS migration because there we don't have a way to ignore any errors. Multifd recv threads can't discern an EOF in the middle of the migration from an EOF after all data has been received. The former is definitely an error and should cause migration to abort, multifd threads to cleanup, etc. > So if there was a bug with multifd threads shutdown it would have > already been manifesting on the non-TLS migration. > Even if non-TLS behaved the same, why would we choose to port a bug to the TLS implementation? We could of course decide at this point to punt the problem forward and when it shows up, we'd have to go implement gnutls_bye() to allow the distinction between good-EOF/bad-EOF or maybe add an extra sync at the end of migration to make sure the last recv() call is only started after the source has already shutdown() the channel. > Also, to be clear, I'm not advocating for removing that shutdown() > call. Yes, I think we should keep it.