Peter Xu <pet...@redhat.com> writes:

> On Fri, Feb 07, 2025 at 11:27:53AM -0300, Fabiano Rosas wrote:
>> The multifd recv side has been getting a TLS error of
>> GNUTLS_E_PREMATURE_TERMINATION at the end of migration when the send
>> side closes the sockets without ending the TLS session. This has been
>> masked by the code not checking the migration error after loadvm.
>> 
>> Start ending the TLS session at multifd_send_shutdown() so the recv
>> side always sees a clean termination (EOF) and we can start to
>> differentiate that from an actual premature termination that might
>> possibly happen in the middle of the migration.
>> 
>> There's nothing to be done if a previous migration error has already
>> broken the connection, so add a comment explaining it and ignore any
>> errors coming from gnutls_bye().
>> 
>> This doesn't break compat with older recv-side QEMUs because EOF has
>> always caused the recv thread to exit cleanly.
>> 
>> Signed-off-by: Fabiano Rosas <faro...@suse.de>
>
> Reviewed-by: Peter Xu <pet...@redhat.com>
>
> One trivial comment..
>
>> ---
>>  migration/multifd.c | 34 +++++++++++++++++++++++++++++++++-
>>  migration/tls.c     |  5 +++++
>>  migration/tls.h     |  2 +-
>>  3 files changed, 39 insertions(+), 2 deletions(-)
>> 
>> diff --git a/migration/multifd.c b/migration/multifd.c
>> index ab73d6d984..b57cad3bb1 100644
>> --- a/migration/multifd.c
>> +++ b/migration/multifd.c
>> @@ -490,6 +490,32 @@ void multifd_send_shutdown(void)
>>          return;
>>      }
>>  
>> +    for (i = 0; i < migrate_multifd_channels(); i++) {
>> +        MultiFDSendParams *p = &multifd_send_state->params[i];
>> +
>> +        /* thread_created implies the TLS handshake has succeeded */
>> +        if (p->tls_thread_created && p->thread_created) {
>> +            Error *local_err = NULL;
>> +            /*
>> +             * The destination expects the TLS session to always be
>> +             * properly terminated. This helps to detect a premature
>> +             * termination in the middle of the stream.  Note that
>> +             * older QEMUs always break the connection on the source
>> +             * and the destination always sees
>> +             * GNUTLS_E_PREMATURE_TERMINATION.
>> +             */
>> +            migration_tls_channel_end(p->c, &local_err);
>> +
>> +            if (local_err) {
>> +                /*
>> +                 * The above can fail with broken pipe due to a
>> +                 * previous migration error, ignore the error.
>> +                 */
>> +                assert(migration_has_failed(migrate_get_current()));
>
> Considering this is still src, do we want to be softer on this by
> error_report?
>
> Logically !migration_has_failed() means it succeeded, so we can throw src
> qemu way now, that shouldn't be a huge deal. More of thinking out loud kind
> of comment..  Your call.
>

Maybe even a warning? If at this point migration succeeded, it's probably
best to let cleanup carry on.

>> +            }
>> +        }
>> +    }
>> +
>>      multifd_send_terminate_threads();
>>  
>>      for (i = 0; i < migrate_multifd_channels(); i++) {
>> @@ -1141,7 +1167,13 @@ static void *multifd_recv_thread(void *opaque)
>>  
>>              ret = qio_channel_read_all_eof(p->c, (void *)p->packet,
>>                                             p->packet_len, &local_err);
>> -            if (ret == 0 || ret == -1) {   /* 0: EOF  -1: Error */
>> +            if (!ret) {
>> +                /* EOF */
>> +                assert(!local_err);
>> +                break;
>> +            }
>> +
>> +            if (ret == -1) {
>>                  break;
>>              }
>>  
>> diff --git a/migration/tls.c b/migration/tls.c
>> index fa03d9136c..5cbf952383 100644
>> --- a/migration/tls.c
>> +++ b/migration/tls.c
>> @@ -156,6 +156,11 @@ void migration_tls_channel_connect(MigrationState *s,
>>                                NULL);
>>  }
>>  
>> +void migration_tls_channel_end(QIOChannel *ioc, Error **errp)
>> +{
>> +    qio_channel_tls_bye(QIO_CHANNEL_TLS(ioc), errp);
>> +}
>> +
>>  bool migrate_channel_requires_tls_upgrade(QIOChannel *ioc)
>>  {
>>      if (!migrate_tls()) {
>> diff --git a/migration/tls.h b/migration/tls.h
>> index 5797d153cb..58b25e1228 100644
>> --- a/migration/tls.h
>> +++ b/migration/tls.h
>> @@ -36,7 +36,7 @@ void migration_tls_channel_connect(MigrationState *s,
>>                                     QIOChannel *ioc,
>>                                     const char *hostname,
>>                                     Error **errp);
>> -
>> +void migration_tls_channel_end(QIOChannel *ioc, Error **errp);
>>  /* Whether the QIO channel requires further TLS handshake? */
>>  bool migrate_channel_requires_tls_upgrade(QIOChannel *ioc);
>>  
>> -- 
>> 2.35.3
>> 

Reply via email to