multifd: Allow premature EOF on TLS incoming channels

Fabiano Rosas Thu, 06 Feb 2025 09:33:11 -0800

"Maciej S. Szmigiero" <m...@maciej.szmigiero.name> writes:

> On 6.02.2025 16:20, Fabiano Rosas wrote:
>> "Maciej S. Szmigiero" <m...@maciej.szmigiero.name> writes:
>> 
>>> On 6.02.2025 15:13, Fabiano Rosas wrote:
>>>> "Maciej S. Szmigiero" <m...@maciej.szmigiero.name> writes:
>>>>
>>>>> On 5.02.2025 21:42, Fabiano Rosas wrote:
>>>>>> Fabiano Rosas <faro...@suse.de> writes:
>>>>>>
>>>>>>> Daniel P. Berrangé <berra...@redhat.com> writes:
>>>>>>>
>>>>>>>> On Tue, Feb 04, 2025 at 10:31:31AM -0500, Peter Xu wrote:
>>>>>>>>> On Tue, Feb 04, 2025 at 03:39:00PM +0100, Maciej S. Szmigiero wrote:
>>>>>>>>>> On 3.02.2025 23:56, Peter Xu wrote:
>>>>>>>>>>> On Mon, Feb 03, 2025 at 10:41:32PM +0100, Maciej S. Szmigiero wrote:
>>>>>>>>>>>> On 3.02.2025 21:20, Peter Xu wrote:
>>>>>>>>>>>>> On Mon, Feb 03, 2025 at 07:53:00PM +0100, Maciej S. Szmigiero 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> On 3.02.2025 19:20, Peter Xu wrote:
>>>>>>>>>>>>>>> On Thu, Jan 30, 2025 at 11:08:29AM +0100, Maciej S. Szmigiero 
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> From: "Maciej S. Szmigiero" <maciej.szmigi...@oracle.com>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Multifd send channels are terminated by calling
>>>>>>>>>>>>>>>> qio_channel_shutdown(QIO_CHANNEL_SHUTDOWN_BOTH) in
>>>>>>>>>>>>>>>> multifd_send_terminate_threads(), which in the TLS case 
>>>>>>>>>>>>>>>> essentially
>>>>>>>>>>>>>>>> calls shutdown(SHUT_RDWR) on the underlying raw socket.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Unfortunately, this does not terminate the TLS session 
>>>>>>>>>>>>>>>> properly and
>>>>>>>>>>>>>>>> the receive side sees this as a GNUTLS_E_PREMATURE_TERMINATION 
>>>>>>>>>>>>>>>> error.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The only reason why this wasn't causing migration failures is 
>>>>>>>>>>>>>>>> because
>>>>>>>>>>>>>>>> the current migration code apparently does not check for 
>>>>>>>>>>>>>>>> migration
>>>>>>>>>>>>>>>> error being set after the end of the multifd receive process.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> However, this will change soon so the multifd receive code has 
>>>>>>>>>>>>>>>> to be
>>>>>>>>>>>>>>>> prepared to not return an error on such premature TLS session 
>>>>>>>>>>>>>>>> EOF.
>>>>>>>>>>>>>>>> Use the newly introduced QIOChannelTLS method for that.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It's worth noting that even if the sender were to be changed 
>>>>>>>>>>>>>>>> to terminate
>>>>>>>>>>>>>>>> the TLS connection properly the receive side still needs to 
>>>>>>>>>>>>>>>> remain
>>>>>>>>>>>>>>>> compatible with older QEMU bit stream which does not do this.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If this is an existing bug, we could add a Fixes.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It is an existing issue but only uncovered by this patch set.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As far as I can see it was always there, so it would need some
>>>>>>>>>>>>>> thought where to point that Fixes tag.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If there's no way to trigger a real functional bug anyway, it's 
>>>>>>>>>>>>> also ok we
>>>>>>>>>>>>> omit the Fixes.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Two pure questions..
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         - What is the correct way to terminate the TLS session 
>>>>>>>>>>>>>>> without this flag?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I guess one would need to call gnutls_bye() like in this GnuTLS 
>>>>>>>>>>>>>> example:
>>>>>>>>>>>>>> https://gitlab.com/gnutls/gnutls/-/blob/2b8c3e4c71ad380bbbffb32e6003b34ecad596e3/doc/examples/ex-client-anon.c#L102
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         - Why this is only needed by multifd sessions?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What uncovered the issue was switching the load threads to using
>>>>>>>>>>>>>> migrate_set_error() instead of their own result variable
>>>>>>>>>>>>>> (load_threads_ret) which you had requested during the previous
>>>>>>>>>>>>>> patch set version review:
>>>>>>>>>>>>>> https://lore.kernel.org/qemu-devel/Z1DbH5fwBaxtgrvH@x1n/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Turns out that the multifd receive code always returned
>>>>>>>>>>>>>> error in the TLS case, just nothing was previously checking for
>>>>>>>>>>>>>> that error presence.
>>>>>>>>>>>>>
>>>>>>>>>>>>> What I was curious is whether this issue also exists for the main 
>>>>>>>>>>>>> migration
>>>>>>>>>>>>> channel when with tls, especially when e.g. multifd not enabled 
>>>>>>>>>>>>> at all.  As
>>>>>>>>>>>>> I don't see anywhere that qemu uses gnutls_bye() for any tls 
>>>>>>>>>>>>> session.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think it's a good to find that we overlooked this before.. and 
>>>>>>>>>>>>> IMHO it's
>>>>>>>>>>>>> always good we could fix this.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Does it mean we need proper gnutls_bye() somewhere?
>>>>>>>>>>>>>
>>>>>>>>>>>>> If we need an explicit gnutls_bye(), then I wonder if that should 
>>>>>>>>>>>>> be done
>>>>>>>>>>>>> on the main channel as well.
>>>>>>>>>>>>
>>>>>>>>>>>> That's a good question and looking at the code 
>>>>>>>>>>>> qemu_loadvm_state_main() exits
>>>>>>>>>>>> on receiving "QEMU_VM_EOF" section (that's different from 
>>>>>>>>>>>> receiving socket EOF)
>>>>>>>>>>>> and then optionally "QEMU_VM_VMDESCRIPTION" section is read with 
>>>>>>>>>>>> explicit size
>>>>>>>>>>>> in qemu_loadvm_state() - so still not until channel EOF.
>>>>>>>>>>>
>>>>>>>>>>> I had a closer look, I do feel like such pre-mature termination is 
>>>>>>>>>>> caused
>>>>>>>>>>> by explicit shutdown()s of the iochannels, looks like that can 
>>>>>>>>>>> cause issue
>>>>>>>>>>> even after everything is sent.  Then I noticed indeed multifd sender
>>>>>>>>>>> iochannels will get explicit shutdown()s since commit 077fbb5942, 
>>>>>>>>>>> while we
>>>>>>>>>>> don't do that for the main channel.  Maybe that is a major 
>>>>>>>>>>> difference.
>>>>>>>>>>>
>>>>>>>>>>> Now I wonder whether we should shutdown() the channel at all if 
>>>>>>>>>>> migration
>>>>>>>>>>> succeeded, because looks like it can cause tls session to interrupt 
>>>>>>>>>>> even if
>>>>>>>>>>> the shutdown() is done after sent everything, and if so it'll 
>>>>>>>>>>> explain why
>>>>>>>>>>> you hit the issue with tls.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Then I can't see anything else reading the channel until it is 
>>>>>>>>>>>> closed in
>>>>>>>>>>>> migration_incoming_state_destroy().
>>>>>>>>>>>>
>>>>>>>>>>>> So most likely the main migration channel will never read far 
>>>>>>>>>>>> enough to
>>>>>>>>>>>> reach that GNUTLS_E_PREMATURE_TERMINATION error.
>>>>>>>>>>>>
>>>>>>>>>>>>> If we don't need gnutls_bye(), then should we always ignore 
>>>>>>>>>>>>> pre-mature
>>>>>>>>>>>>> termination of tls no matter if it's multifd or non-multifd 
>>>>>>>>>>>>> channel (or
>>>>>>>>>>>>> even a tls session that is not migration-related)?
>>>>>>>>>>>>
>>>>>>>>>>>> So basically have this patch extended to calling
>>>>>>>>>>>> qio_channel_tls_set_premature_eof_okay() also on the main 
>>>>>>>>>>>> migration channel?
>>>>>>>>>>>
>>>>>>>>>>> If above theory can stand, then eof-okay could be a workaround 
>>>>>>>>>>> papering
>>>>>>>>>>> over the real problem that we shouldn't always shutdown()..
>>>>>>>>>>>
>>>>>>>>>>> Could you have a look at below patch and see whether it can fix the 
>>>>>>>>>>> problem
>>>>>>>>>>> you hit too, in replace of these two patches (including the previous
>>>>>>>>>>> iochannel change)?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Unfortunately, the patch below does not fix the problem:
>>>>>>>>>>> qemu-system-x86_64: Cannot read from TLS channel: The TLS 
>>>>>>>>>>> connection was non-properly terminated.
>>>>>>>>>>> qemu-system-x86_64: Cannot read from TLS channel: The TLS 
>>>>>>>>>>> connection was non-properly terminated.
>>>>>>>>>>
>>>>>>>>>> I think that, even in the absence of shutdown(), if the sender does 
>>>>>>>>>> not
>>>>>>>>>> call gnutls_bye() the TLS session is considered improperly 
>>>>>>>>>> terminated.
>>>>>>>>>
>>>>>>>>> Ah..
>>>>>>>>>
>>>>>>>>> How about one more change on top of above change to disconnect 
>>>>>>>>> properly for
>>>>>>>>> TLS?  Something like gnutls_bye() in qio_channel_tls_close(), would 
>>>>>>>>> that
>>>>>>>>> make sense to you?
>>>>>>>>
>>>>>>>> Calling gnutls_bye from qio_channel_tls_close is not viable for the
>>>>>>>> API contract of qio_channel_close. gnutls_bye needs to be able to
>>>>>>>> perform I/O, which means we need to be able to tell the caller
>>>>>>>> whether it needs to perform an event loop wait for POLLIN or POLLOUT.
>>>>>>>>
>>>>>>>> This is the same API design scenario as the gnutls_handshake method.
>>>>>>>> As such I tdon't think it is practical to abstract it inside any
>>>>>>>> existing QIOChannel API call, it'll have to be standalone like
>>>>>>>> qio_channel_tls_handshake() is.
>>>>>>>>
>>>>>>>
>>>>>>> I implemented the call to gnutls_bye:
>>>>>>> https://gitlab.com/farosas/qemu/-/commits/migration-tls-bye
>>>>>>>
>>>>>>> Then while testing it I realised we actually have a regression from 9.2:
>>>>>>>
>>>>>>> 1d457daf86 ("migration/multifd: Further remove the SYNC on complete")
>>>>>>>
>>>>>>> It seems that patch somehow affected the ordering between src shutdown
>>>>>>> vs. recv shutdown and now the recv channels are staying around to see
>>>>>>> the connection being broken. Or something... I'm still looking into it.
>>>>>>>
>>>>>>
>>>>>> Ok, so the issue is that the recv side would previously be stuck at the
>>>>>> sync semaphore and multifd_recv_terminate_threads() would kick it only
>>>>>> after 'exiting' was set, so no further recv() would happen.
>>>>>>
>>>>>> After the patch, there's no final sync anymore, so the recv thread loops
>>>>>> around and waits at the recv() until multifd_send_terminate_threads()
>>>>>> closes the connection.
>>>>>>
>>>>>> Waiting on sem_sync as before would lead to a cleaner termination
>>>>>> process IMO, but I don't think it's worth the extra complexity of
>>>>>> introducing a sync to the device state migration.
>>>>>>
>>>>>> So I think we'll have to go with one of the approaches suggested on this
>>>>>> thread (gnutls_bye or premature_ok). I'm fine either way, but let's make
>>>>>> sure we add a reference to the patch above and some words explaining the
>>>>>> situation.
>>>>>
>>>>> We still need premature_ok for handling older QEMU versions that do not
>>>>> terminate the TLS stream correctly since the TLS test regression happens
>>>>> even without device state transfer being enabled.
>>>>
>>>> What exactly is the impact of this issue to the device state series?
>>>>   From the cover letter:
>>>>
>>>>     * qemu_loadvm_load_thread_pool now reports error via 
>>>> migrate_set_error()
>>>>     instead of dedicated load_threads_ret variable.
>>>>
>>>>     * Since the change above uncovered an issue with respect to multifd 
>>>> send
>>>>     channels not terminating TLS session properly QIOChannelTLS now allows
>>>>     gracefully handling this situation.
>>>>
>>>> I understand qemu_loadvm_load_thread_pool() is attempting to use
>>>> migrate_set_error() but an error is already set by the recv_thread. Is
>>>> that the issue?
>>>
>>> Yes, when we test for load threads error in the TLS case we see that 
>>> multifd TLS
>>> one.
>>> We need to know whether the load threads succeeded so we either continue 
>>> with the
>>> migration or abort it.
>>>
>>>> I wonder if we could isolate somehow this so it doesn't
>>>> impact this series.
>>>
>>> The previous version simply used a dedicated load_threads_ret variable
>>> so it wasn't affected but Peter likes migrate_set_error() more for
>>> migration thread pools.
>>>    
>>>>>
>>>>> So I think that's what we should use generally.
>>>>>     
>>>>
>>>> For premature_ok, we need to make sure it will not hang QEMU if the
>>>> connection gets unexpectedly closed. The current code checks for
>>>> shutdown() having already happened, which is fine because it means we're
>>>> already on the way out. However, if any ol' recv() can now ignore a
>>>> premature termination error, then the recv_thread will not trigger
>>>> cleanup of the multifd_recv threads.
>>>
>>> Enabling premature_ok just turns GNUTLS_E_PREMATURE_TERMINATION error
>>> on EOF into a normal EOF.
>>>
>>> The receive thread will exit on either one:
>>>>            if (ret == 0 || ret == -1) {   /* 0: EOF  -1: Error */
>>>>                  break;
>>>>              }
>>>
>>> It's true that multifd_recv_terminate_threads() will only be called
>>> by multifd_recv_cleanup() or multifd_recv_shutdown() in this case,
>>> however this is already the case for non-TLS migration.
>>>
>> 
>> My point is that a premature termination sets local_err and a
>> premature_ok==true doesn't. 
>
> * Only in the TLS case - the non-TLS doesn't set any error even with
> premature_ok==true (assuming there hasn't been any other error during
> receive)
>
> * If there *has* been other any other error during receive then it
> will be set and the code flow will be the same even with premature_ok==true.
>


Sure, I'm not implying the change affects non-TLS. I'm just arguing that
non-TLS behavior should not be taken into consideration because it
doesn't ignore any errors anyway.

The whole (and only) point I'm making is what happens when
e.g. multifd_send shuts down the connection prematurely due to a
bug. IIUC, premature_ok would make that be treated as normal EOF in
multifd_recv and that is a problem because any thread that sees an error
should terminate all others instead of just exiting.

Currently, GNUTLS_E_PREMATURE_TERMINATION doesn't abort migration, but
it does cause multifd_recv_terminate_threads() to be executed. The
change from this series will make it so that
GNUTLS_E_PREMATURE_TERMINATION never leads to
multifd_recv_terminate_threads(), while the correct behavior would be to
trigger cleanup always, except for the very last recv().

>> So it's not the same as non-TLS migration
>> because there we don't have a way to ignore any errors.
>
> The GNUTLS_E_PREMATURE_TERMINATION error can't happen in the non-TLS case
> so by definition we can't ignore it in the non-TLS case.
>
> And we don't ignore any other error in the TLS/non-TLS case.
>
>> Multifd recv threads can't discern an EOF in the middle of the migration
>> from an EOF after all data has been received. The former is definitely
>> an error and should cause migration to abort, multifd threads to
>> cleanup, etc.
>
> So in this case we should set the QIO_CHANNEL_READ_RELAXED_EOF flag on
> the multifd channel recv thread main loop only, and leave the
> mid-stream page receive methods report GNUTLS_E_PREMATURE_TERMINATION
> as usual.
>

The multifd recv loop is where I don't want to have RELAXED_EOF. Except
for the last recv() call. Which of course we can't differentiate unless
we use something like gnutls_bye() of MULTIFD_FLAG_EOS as you suggest
below.

> This makes the TLS case work the same with respect to premature
> EOF as the non-TLS case since the non-TLS case can't detect premature
> EOF in the multifd channel recv thread main loop either.
>
>>> So if there was a bug with multifd threads shutdown it would have
>>> already been manifesting on the non-TLS migration.
>>>
>> 
>> Even if non-TLS behaved the same, why would we choose to port a bug to
>> the TLS implementation?
>> 
>> We could of course decide at this point to punt the problem forward and
>> when it shows up, we'd have to go implement gnutls_bye() to allow the
>> distinction between good-EOF/bad-EOF or maybe add an extra sync at the
>> end of migration to make sure the last recv() call is only started after
>> the source has already shutdown() the channel.
>
> If we do some kind of a premature EOF detection then it should probably
> work for the non-TLS case too (since that's probably by far the most
> common use case).
> So adding some MULTIFD_FLAG_EOS would make the most sense and would work
> even with QIO_CHANNEL_READ_RELAXED_EOF being set.
>

Indeed, if MULTIFD_FLAG_EOS is not seen, the recv thread could treat any
EOF as an error. The question is whether we can add that without
disrupting multifd synchronization too much.

> In any case we'd still need some kind of a compatibility behavior for
> the TLS bit stream emitted by older QEMU versions (which is always
> improperly terminated).
>

There is no compat issue. For <= 9.2, QEMU is still doing an extra
multifd_send_sync_main(), which results in an extra MULTIFD_FLAG_SYNC on
the destination and it gets stuck waiting for the
RAM_SAVE_FLAG_MULTIFD_FLUSH that never comes. Therefore the src always
closes the connection before dst reaches the extra recv().

I test migration both ways with 2 previous QEMU versions and the
gnutls_bye() series passes all tests. I also put an assert at
tlssession.c and never triggers for GNUTLS_E_PREMATURE_TERMINATION. The
MULTIFD_FLAG_EOS should behave the same.

Re: [PATCH v4 08/33] migration/multifd: Allow premature EOF on TLS incoming channels

Reply via email to