On Fri, Feb 07, 2025 at 11:27:53AM -0300, Fabiano Rosas wrote: > The multifd recv side has been getting a TLS error of > GNUTLS_E_PREMATURE_TERMINATION at the end of migration when the send > side closes the sockets without ending the TLS session. This has been > masked by the code not checking the migration error after loadvm. > > Start ending the TLS session at multifd_send_shutdown() so the recv > side always sees a clean termination (EOF) and we can start to > differentiate that from an actual premature termination that might > possibly happen in the middle of the migration. > > There's nothing to be done if a previous migration error has already > broken the connection, so add a comment explaining it and ignore any > errors coming from gnutls_bye(). > > This doesn't break compat with older recv-side QEMUs because EOF has > always caused the recv thread to exit cleanly. > > Signed-off-by: Fabiano Rosas <faro...@suse.de>
Reviewed-by: Peter Xu <pet...@redhat.com> One trivial comment.. > --- > migration/multifd.c | 34 +++++++++++++++++++++++++++++++++- > migration/tls.c | 5 +++++ > migration/tls.h | 2 +- > 3 files changed, 39 insertions(+), 2 deletions(-) > > diff --git a/migration/multifd.c b/migration/multifd.c > index ab73d6d984..b57cad3bb1 100644 > --- a/migration/multifd.c > +++ b/migration/multifd.c > @@ -490,6 +490,32 @@ void multifd_send_shutdown(void) > return; > } > > + for (i = 0; i < migrate_multifd_channels(); i++) { > + MultiFDSendParams *p = &multifd_send_state->params[i]; > + > + /* thread_created implies the TLS handshake has succeeded */ > + if (p->tls_thread_created && p->thread_created) { > + Error *local_err = NULL; > + /* > + * The destination expects the TLS session to always be > + * properly terminated. This helps to detect a premature > + * termination in the middle of the stream. Note that > + * older QEMUs always break the connection on the source > + * and the destination always sees > + * GNUTLS_E_PREMATURE_TERMINATION. > + */ > + migration_tls_channel_end(p->c, &local_err); > + > + if (local_err) { > + /* > + * The above can fail with broken pipe due to a > + * previous migration error, ignore the error. > + */ > + assert(migration_has_failed(migrate_get_current())); Considering this is still src, do we want to be softer on this by error_report? Logically !migration_has_failed() means it succeeded, so we can throw src qemu way now, that shouldn't be a huge deal. More of thinking out loud kind of comment.. Your call. > + } > + } > + } > + > multifd_send_terminate_threads(); > > for (i = 0; i < migrate_multifd_channels(); i++) { > @@ -1141,7 +1167,13 @@ static void *multifd_recv_thread(void *opaque) > > ret = qio_channel_read_all_eof(p->c, (void *)p->packet, > p->packet_len, &local_err); > - if (ret == 0 || ret == -1) { /* 0: EOF -1: Error */ > + if (!ret) { > + /* EOF */ > + assert(!local_err); > + break; > + } > + > + if (ret == -1) { > break; > } > > diff --git a/migration/tls.c b/migration/tls.c > index fa03d9136c..5cbf952383 100644 > --- a/migration/tls.c > +++ b/migration/tls.c > @@ -156,6 +156,11 @@ void migration_tls_channel_connect(MigrationState *s, > NULL); > } > > +void migration_tls_channel_end(QIOChannel *ioc, Error **errp) > +{ > + qio_channel_tls_bye(QIO_CHANNEL_TLS(ioc), errp); > +} > + > bool migrate_channel_requires_tls_upgrade(QIOChannel *ioc) > { > if (!migrate_tls()) { > diff --git a/migration/tls.h b/migration/tls.h > index 5797d153cb..58b25e1228 100644 > --- a/migration/tls.h > +++ b/migration/tls.h > @@ -36,7 +36,7 @@ void migration_tls_channel_connect(MigrationState *s, > QIOChannel *ioc, > const char *hostname, > Error **errp); > - > +void migration_tls_channel_end(QIOChannel *ioc, Error **errp); > /* Whether the QIO channel requires further TLS handshake? */ > bool migrate_channel_requires_tls_upgrade(QIOChannel *ioc); > > -- > 2.35.3 > -- Peter Xu