On Fri, Feb 07, 2025 at 11:27:53AM -0300, Fabiano Rosas wrote:
> The multifd recv side has been getting a TLS error of
> GNUTLS_E_PREMATURE_TERMINATION at the end of migration when the send
> side closes the sockets without ending the TLS session. This has been
> masked by the code not checking the migration error after loadvm.
> 
> Start ending the TLS session at multifd_send_shutdown() so the recv
> side always sees a clean termination (EOF) and we can start to
> differentiate that from an actual premature termination that might
> possibly happen in the middle of the migration.
> 
> There's nothing to be done if a previous migration error has already
> broken the connection, so add a comment explaining it and ignore any
> errors coming from gnutls_bye().
> 
> This doesn't break compat with older recv-side QEMUs because EOF has
> always caused the recv thread to exit cleanly.
> 
> Signed-off-by: Fabiano Rosas <faro...@suse.de>

Reviewed-by: Peter Xu <pet...@redhat.com>

One trivial comment..

> ---
>  migration/multifd.c | 34 +++++++++++++++++++++++++++++++++-
>  migration/tls.c     |  5 +++++
>  migration/tls.h     |  2 +-
>  3 files changed, 39 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/multifd.c b/migration/multifd.c
> index ab73d6d984..b57cad3bb1 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -490,6 +490,32 @@ void multifd_send_shutdown(void)
>          return;
>      }
>  
> +    for (i = 0; i < migrate_multifd_channels(); i++) {
> +        MultiFDSendParams *p = &multifd_send_state->params[i];
> +
> +        /* thread_created implies the TLS handshake has succeeded */
> +        if (p->tls_thread_created && p->thread_created) {
> +            Error *local_err = NULL;
> +            /*
> +             * The destination expects the TLS session to always be
> +             * properly terminated. This helps to detect a premature
> +             * termination in the middle of the stream.  Note that
> +             * older QEMUs always break the connection on the source
> +             * and the destination always sees
> +             * GNUTLS_E_PREMATURE_TERMINATION.
> +             */
> +            migration_tls_channel_end(p->c, &local_err);
> +
> +            if (local_err) {
> +                /*
> +                 * The above can fail with broken pipe due to a
> +                 * previous migration error, ignore the error.
> +                 */
> +                assert(migration_has_failed(migrate_get_current()));

Considering this is still src, do we want to be softer on this by
error_report?

Logically !migration_has_failed() means it succeeded, so we can throw src
qemu way now, that shouldn't be a huge deal. More of thinking out loud kind
of comment..  Your call.

> +            }
> +        }
> +    }
> +
>      multifd_send_terminate_threads();
>  
>      for (i = 0; i < migrate_multifd_channels(); i++) {
> @@ -1141,7 +1167,13 @@ static void *multifd_recv_thread(void *opaque)
>  
>              ret = qio_channel_read_all_eof(p->c, (void *)p->packet,
>                                             p->packet_len, &local_err);
> -            if (ret == 0 || ret == -1) {   /* 0: EOF  -1: Error */
> +            if (!ret) {
> +                /* EOF */
> +                assert(!local_err);
> +                break;
> +            }
> +
> +            if (ret == -1) {
>                  break;
>              }
>  
> diff --git a/migration/tls.c b/migration/tls.c
> index fa03d9136c..5cbf952383 100644
> --- a/migration/tls.c
> +++ b/migration/tls.c
> @@ -156,6 +156,11 @@ void migration_tls_channel_connect(MigrationState *s,
>                                NULL);
>  }
>  
> +void migration_tls_channel_end(QIOChannel *ioc, Error **errp)
> +{
> +    qio_channel_tls_bye(QIO_CHANNEL_TLS(ioc), errp);
> +}
> +
>  bool migrate_channel_requires_tls_upgrade(QIOChannel *ioc)
>  {
>      if (!migrate_tls()) {
> diff --git a/migration/tls.h b/migration/tls.h
> index 5797d153cb..58b25e1228 100644
> --- a/migration/tls.h
> +++ b/migration/tls.h
> @@ -36,7 +36,7 @@ void migration_tls_channel_connect(MigrationState *s,
>                                     QIOChannel *ioc,
>                                     const char *hostname,
>                                     Error **errp);
> -
> +void migration_tls_channel_end(QIOChannel *ioc, Error **errp);
>  /* Whether the QIO channel requires further TLS handshake? */
>  bool migrate_channel_requires_tls_upgrade(QIOChannel *ioc);
>  
> -- 
> 2.35.3
> 

-- 
Peter Xu


Reply via email to