"Maciej S. Szmigiero" <m...@maciej.szmigiero.name> writes: > On 6.02.2025 16:20, Fabiano Rosas wrote: >> "Maciej S. Szmigiero" <m...@maciej.szmigiero.name> writes: >> >>> On 6.02.2025 15:13, Fabiano Rosas wrote: >>>> "Maciej S. Szmigiero" <m...@maciej.szmigiero.name> writes: >>>> >>>>> On 5.02.2025 21:42, Fabiano Rosas wrote: >>>>>> Fabiano Rosas <faro...@suse.de> writes: >>>>>> >>>>>>> Daniel P. Berrangé <berra...@redhat.com> writes: >>>>>>> >>>>>>>> On Tue, Feb 04, 2025 at 10:31:31AM -0500, Peter Xu wrote: >>>>>>>>> On Tue, Feb 04, 2025 at 03:39:00PM +0100, Maciej S. Szmigiero wrote: >>>>>>>>>> On 3.02.2025 23:56, Peter Xu wrote: >>>>>>>>>>> On Mon, Feb 03, 2025 at 10:41:32PM +0100, Maciej S. Szmigiero wrote: >>>>>>>>>>>> On 3.02.2025 21:20, Peter Xu wrote: >>>>>>>>>>>>> On Mon, Feb 03, 2025 at 07:53:00PM +0100, Maciej S. Szmigiero >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> On 3.02.2025 19:20, Peter Xu wrote: >>>>>>>>>>>>>>> On Thu, Jan 30, 2025 at 11:08:29AM +0100, Maciej S. Szmigiero >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> From: "Maciej S. Szmigiero" <maciej.szmigi...@oracle.com> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Multifd send channels are terminated by calling >>>>>>>>>>>>>>>> qio_channel_shutdown(QIO_CHANNEL_SHUTDOWN_BOTH) in >>>>>>>>>>>>>>>> multifd_send_terminate_threads(), which in the TLS case >>>>>>>>>>>>>>>> essentially >>>>>>>>>>>>>>>> calls shutdown(SHUT_RDWR) on the underlying raw socket. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Unfortunately, this does not terminate the TLS session >>>>>>>>>>>>>>>> properly and >>>>>>>>>>>>>>>> the receive side sees this as a GNUTLS_E_PREMATURE_TERMINATION >>>>>>>>>>>>>>>> error. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The only reason why this wasn't causing migration failures is >>>>>>>>>>>>>>>> because >>>>>>>>>>>>>>>> the current migration code apparently does not check for >>>>>>>>>>>>>>>> migration >>>>>>>>>>>>>>>> error being set after the end of the multifd receive process. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> However, this will change soon so the multifd receive code has >>>>>>>>>>>>>>>> to be >>>>>>>>>>>>>>>> prepared to not return an error on such premature TLS session >>>>>>>>>>>>>>>> EOF. >>>>>>>>>>>>>>>> Use the newly introduced QIOChannelTLS method for that. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> It's worth noting that even if the sender were to be changed >>>>>>>>>>>>>>>> to terminate >>>>>>>>>>>>>>>> the TLS connection properly the receive side still needs to >>>>>>>>>>>>>>>> remain >>>>>>>>>>>>>>>> compatible with older QEMU bit stream which does not do this. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> If this is an existing bug, we could add a Fixes. >>>>>>>>>>>>>> >>>>>>>>>>>>>> It is an existing issue but only uncovered by this patch set. >>>>>>>>>>>>>> >>>>>>>>>>>>>> As far as I can see it was always there, so it would need some >>>>>>>>>>>>>> thought where to point that Fixes tag. >>>>>>>>>>>>> >>>>>>>>>>>>> If there's no way to trigger a real functional bug anyway, it's >>>>>>>>>>>>> also ok we >>>>>>>>>>>>> omit the Fixes. >>>>>>>>>>>>> >>>>>>>>>>>>>>> Two pure questions.. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - What is the correct way to terminate the TLS session >>>>>>>>>>>>>>> without this flag? >>>>>>>>>>>>>> >>>>>>>>>>>>>> I guess one would need to call gnutls_bye() like in this GnuTLS >>>>>>>>>>>>>> example: >>>>>>>>>>>>>> https://gitlab.com/gnutls/gnutls/-/blob/2b8c3e4c71ad380bbbffb32e6003b34ecad596e3/doc/examples/ex-client-anon.c#L102 >>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Why this is only needed by multifd sessions? >>>>>>>>>>>>>> >>>>>>>>>>>>>> What uncovered the issue was switching the load threads to using >>>>>>>>>>>>>> migrate_set_error() instead of their own result variable >>>>>>>>>>>>>> (load_threads_ret) which you had requested during the previous >>>>>>>>>>>>>> patch set version review: >>>>>>>>>>>>>> https://lore.kernel.org/qemu-devel/Z1DbH5fwBaxtgrvH@x1n/ >>>>>>>>>>>>>> >>>>>>>>>>>>>> Turns out that the multifd receive code always returned >>>>>>>>>>>>>> error in the TLS case, just nothing was previously checking for >>>>>>>>>>>>>> that error presence. >>>>>>>>>>>>> >>>>>>>>>>>>> What I was curious is whether this issue also exists for the main >>>>>>>>>>>>> migration >>>>>>>>>>>>> channel when with tls, especially when e.g. multifd not enabled >>>>>>>>>>>>> at all. As >>>>>>>>>>>>> I don't see anywhere that qemu uses gnutls_bye() for any tls >>>>>>>>>>>>> session. >>>>>>>>>>>>> >>>>>>>>>>>>> I think it's a good to find that we overlooked this before.. and >>>>>>>>>>>>> IMHO it's >>>>>>>>>>>>> always good we could fix this. >>>>>>>>>>>>> >>>>>>>>>>>>> Does it mean we need proper gnutls_bye() somewhere? >>>>>>>>>>>>> >>>>>>>>>>>>> If we need an explicit gnutls_bye(), then I wonder if that should >>>>>>>>>>>>> be done >>>>>>>>>>>>> on the main channel as well. >>>>>>>>>>>> >>>>>>>>>>>> That's a good question and looking at the code >>>>>>>>>>>> qemu_loadvm_state_main() exits >>>>>>>>>>>> on receiving "QEMU_VM_EOF" section (that's different from >>>>>>>>>>>> receiving socket EOF) >>>>>>>>>>>> and then optionally "QEMU_VM_VMDESCRIPTION" section is read with >>>>>>>>>>>> explicit size >>>>>>>>>>>> in qemu_loadvm_state() - so still not until channel EOF. >>>>>>>>>>> >>>>>>>>>>> I had a closer look, I do feel like such pre-mature termination is >>>>>>>>>>> caused >>>>>>>>>>> by explicit shutdown()s of the iochannels, looks like that can >>>>>>>>>>> cause issue >>>>>>>>>>> even after everything is sent. Then I noticed indeed multifd sender >>>>>>>>>>> iochannels will get explicit shutdown()s since commit 077fbb5942, >>>>>>>>>>> while we >>>>>>>>>>> don't do that for the main channel. Maybe that is a major >>>>>>>>>>> difference. >>>>>>>>>>> >>>>>>>>>>> Now I wonder whether we should shutdown() the channel at all if >>>>>>>>>>> migration >>>>>>>>>>> succeeded, because looks like it can cause tls session to interrupt >>>>>>>>>>> even if >>>>>>>>>>> the shutdown() is done after sent everything, and if so it'll >>>>>>>>>>> explain why >>>>>>>>>>> you hit the issue with tls. >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Then I can't see anything else reading the channel until it is >>>>>>>>>>>> closed in >>>>>>>>>>>> migration_incoming_state_destroy(). >>>>>>>>>>>> >>>>>>>>>>>> So most likely the main migration channel will never read far >>>>>>>>>>>> enough to >>>>>>>>>>>> reach that GNUTLS_E_PREMATURE_TERMINATION error. >>>>>>>>>>>> >>>>>>>>>>>>> If we don't need gnutls_bye(), then should we always ignore >>>>>>>>>>>>> pre-mature >>>>>>>>>>>>> termination of tls no matter if it's multifd or non-multifd >>>>>>>>>>>>> channel (or >>>>>>>>>>>>> even a tls session that is not migration-related)? >>>>>>>>>>>> >>>>>>>>>>>> So basically have this patch extended to calling >>>>>>>>>>>> qio_channel_tls_set_premature_eof_okay() also on the main >>>>>>>>>>>> migration channel? >>>>>>>>>>> >>>>>>>>>>> If above theory can stand, then eof-okay could be a workaround >>>>>>>>>>> papering >>>>>>>>>>> over the real problem that we shouldn't always shutdown().. >>>>>>>>>>> >>>>>>>>>>> Could you have a look at below patch and see whether it can fix the >>>>>>>>>>> problem >>>>>>>>>>> you hit too, in replace of these two patches (including the previous >>>>>>>>>>> iochannel change)? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Unfortunately, the patch below does not fix the problem: >>>>>>>>>>> qemu-system-x86_64: Cannot read from TLS channel: The TLS >>>>>>>>>>> connection was non-properly terminated. >>>>>>>>>>> qemu-system-x86_64: Cannot read from TLS channel: The TLS >>>>>>>>>>> connection was non-properly terminated. >>>>>>>>>> >>>>>>>>>> I think that, even in the absence of shutdown(), if the sender does >>>>>>>>>> not >>>>>>>>>> call gnutls_bye() the TLS session is considered improperly >>>>>>>>>> terminated. >>>>>>>>> >>>>>>>>> Ah.. >>>>>>>>> >>>>>>>>> How about one more change on top of above change to disconnect >>>>>>>>> properly for >>>>>>>>> TLS? Something like gnutls_bye() in qio_channel_tls_close(), would >>>>>>>>> that >>>>>>>>> make sense to you? >>>>>>>> >>>>>>>> Calling gnutls_bye from qio_channel_tls_close is not viable for the >>>>>>>> API contract of qio_channel_close. gnutls_bye needs to be able to >>>>>>>> perform I/O, which means we need to be able to tell the caller >>>>>>>> whether it needs to perform an event loop wait for POLLIN or POLLOUT. >>>>>>>> >>>>>>>> This is the same API design scenario as the gnutls_handshake method. >>>>>>>> As such I tdon't think it is practical to abstract it inside any >>>>>>>> existing QIOChannel API call, it'll have to be standalone like >>>>>>>> qio_channel_tls_handshake() is. >>>>>>>> >>>>>>> >>>>>>> I implemented the call to gnutls_bye: >>>>>>> https://gitlab.com/farosas/qemu/-/commits/migration-tls-bye >>>>>>> >>>>>>> Then while testing it I realised we actually have a regression from 9.2: >>>>>>> >>>>>>> 1d457daf86 ("migration/multifd: Further remove the SYNC on complete") >>>>>>> >>>>>>> It seems that patch somehow affected the ordering between src shutdown >>>>>>> vs. recv shutdown and now the recv channels are staying around to see >>>>>>> the connection being broken. Or something... I'm still looking into it. >>>>>>> >>>>>> >>>>>> Ok, so the issue is that the recv side would previously be stuck at the >>>>>> sync semaphore and multifd_recv_terminate_threads() would kick it only >>>>>> after 'exiting' was set, so no further recv() would happen. >>>>>> >>>>>> After the patch, there's no final sync anymore, so the recv thread loops >>>>>> around and waits at the recv() until multifd_send_terminate_threads() >>>>>> closes the connection. >>>>>> >>>>>> Waiting on sem_sync as before would lead to a cleaner termination >>>>>> process IMO, but I don't think it's worth the extra complexity of >>>>>> introducing a sync to the device state migration. >>>>>> >>>>>> So I think we'll have to go with one of the approaches suggested on this >>>>>> thread (gnutls_bye or premature_ok). I'm fine either way, but let's make >>>>>> sure we add a reference to the patch above and some words explaining the >>>>>> situation. >>>>> >>>>> We still need premature_ok for handling older QEMU versions that do not >>>>> terminate the TLS stream correctly since the TLS test regression happens >>>>> even without device state transfer being enabled. >>>> >>>> What exactly is the impact of this issue to the device state series? >>>> From the cover letter: >>>> >>>> * qemu_loadvm_load_thread_pool now reports error via >>>> migrate_set_error() >>>> instead of dedicated load_threads_ret variable. >>>> >>>> * Since the change above uncovered an issue with respect to multifd >>>> send >>>> channels not terminating TLS session properly QIOChannelTLS now allows >>>> gracefully handling this situation. >>>> >>>> I understand qemu_loadvm_load_thread_pool() is attempting to use >>>> migrate_set_error() but an error is already set by the recv_thread. Is >>>> that the issue? >>> >>> Yes, when we test for load threads error in the TLS case we see that >>> multifd TLS >>> one. >>> We need to know whether the load threads succeeded so we either continue >>> with the >>> migration or abort it. >>> >>>> I wonder if we could isolate somehow this so it doesn't >>>> impact this series. >>> >>> The previous version simply used a dedicated load_threads_ret variable >>> so it wasn't affected but Peter likes migrate_set_error() more for >>> migration thread pools. >>> >>>>> >>>>> So I think that's what we should use generally. >>>>> >>>> >>>> For premature_ok, we need to make sure it will not hang QEMU if the >>>> connection gets unexpectedly closed. The current code checks for >>>> shutdown() having already happened, which is fine because it means we're >>>> already on the way out. However, if any ol' recv() can now ignore a >>>> premature termination error, then the recv_thread will not trigger >>>> cleanup of the multifd_recv threads. >>> >>> Enabling premature_ok just turns GNUTLS_E_PREMATURE_TERMINATION error >>> on EOF into a normal EOF. >>> >>> The receive thread will exit on either one: >>>> if (ret == 0 || ret == -1) { /* 0: EOF -1: Error */ >>>> break; >>>> } >>> >>> It's true that multifd_recv_terminate_threads() will only be called >>> by multifd_recv_cleanup() or multifd_recv_shutdown() in this case, >>> however this is already the case for non-TLS migration. >>> >> >> My point is that a premature termination sets local_err and a >> premature_ok==true doesn't. > > * Only in the TLS case - the non-TLS doesn't set any error even with > premature_ok==true (assuming there hasn't been any other error during > receive) > > * If there *has* been other any other error during receive then it > will be set and the code flow will be the same even with premature_ok==true. >
Sure, I'm not implying the change affects non-TLS. I'm just arguing that non-TLS behavior should not be taken into consideration because it doesn't ignore any errors anyway. The whole (and only) point I'm making is what happens when e.g. multifd_send shuts down the connection prematurely due to a bug. IIUC, premature_ok would make that be treated as normal EOF in multifd_recv and that is a problem because any thread that sees an error should terminate all others instead of just exiting. Currently, GNUTLS_E_PREMATURE_TERMINATION doesn't abort migration, but it does cause multifd_recv_terminate_threads() to be executed. The change from this series will make it so that GNUTLS_E_PREMATURE_TERMINATION never leads to multifd_recv_terminate_threads(), while the correct behavior would be to trigger cleanup always, except for the very last recv(). >> So it's not the same as non-TLS migration >> because there we don't have a way to ignore any errors. > > The GNUTLS_E_PREMATURE_TERMINATION error can't happen in the non-TLS case > so by definition we can't ignore it in the non-TLS case. > > And we don't ignore any other error in the TLS/non-TLS case. > >> Multifd recv threads can't discern an EOF in the middle of the migration >> from an EOF after all data has been received. The former is definitely >> an error and should cause migration to abort, multifd threads to >> cleanup, etc. > > So in this case we should set the QIO_CHANNEL_READ_RELAXED_EOF flag on > the multifd channel recv thread main loop only, and leave the > mid-stream page receive methods report GNUTLS_E_PREMATURE_TERMINATION > as usual. > The multifd recv loop is where I don't want to have RELAXED_EOF. Except for the last recv() call. Which of course we can't differentiate unless we use something like gnutls_bye() of MULTIFD_FLAG_EOS as you suggest below. > This makes the TLS case work the same with respect to premature > EOF as the non-TLS case since the non-TLS case can't detect premature > EOF in the multifd channel recv thread main loop either. > >>> So if there was a bug with multifd threads shutdown it would have >>> already been manifesting on the non-TLS migration. >>> >> >> Even if non-TLS behaved the same, why would we choose to port a bug to >> the TLS implementation? >> >> We could of course decide at this point to punt the problem forward and >> when it shows up, we'd have to go implement gnutls_bye() to allow the >> distinction between good-EOF/bad-EOF or maybe add an extra sync at the >> end of migration to make sure the last recv() call is only started after >> the source has already shutdown() the channel. > > If we do some kind of a premature EOF detection then it should probably > work for the non-TLS case too (since that's probably by far the most > common use case). > So adding some MULTIFD_FLAG_EOS would make the most sense and would work > even with QIO_CHANNEL_READ_RELAXED_EOF being set. > Indeed, if MULTIFD_FLAG_EOS is not seen, the recv thread could treat any EOF as an error. The question is whether we can add that without disrupting multifd synchronization too much. > In any case we'd still need some kind of a compatibility behavior for > the TLS bit stream emitted by older QEMU versions (which is always > improperly terminated). > There is no compat issue. For <= 9.2, QEMU is still doing an extra multifd_send_sync_main(), which results in an extra MULTIFD_FLAG_SYNC on the destination and it gets stuck waiting for the RAM_SAVE_FLAG_MULTIFD_FLUSH that never comes. Therefore the src always closes the connection before dst reaches the extra recv(). I test migration both ways with 2 previous QEMU versions and the gnutls_bye() series passes all tests. I also put an assert at tlssession.c and never triggers for GNUTLS_E_PREMATURE_TERMINATION. The MULTIFD_FLAG_EOS should behave the same.