multifd: Move semaphore release into main thread

Fabiano Rosas Thu, 16 Nov 2023 10:14:21 -0800

Juan Quintela <quint...@redhat.com> writes:

> Peter Xu <pet...@redhat.com> wrote:
>> On Thu, Nov 09, 2023 at 01:58:56PM -0300, Fabiano Rosas wrote:
>>> We cannot operate on the multifd semaphores outside of the multifd
>>> channel thread
>>> because multifd_save_cleanup() can run in parallel and
>>> attempt to destroy the mutexes, which causes an assert.
>>> 
>>> Looking at the places where we use the semaphores aside from the
>>> migration thread, there's only the TLS handshake thread and the
>>> initial multifd_channel_connect() in the main thread. These are places
>>> where creating the multifd channel cannot fail, so releasing the
>>> semaphores at these places only serves the purpose of avoiding a
>>> deadlock when an error happens before the channel(s) have started, but
>>> after the migration thread has already called
>>> multifd_send_sync_main().
>>> 
>>> Instead of attempting to release the semaphores at those places, move
>>> the release into multifd_save_cleanup(). This puts the semaphore usage
>>> in the same thread that does the cleanup, eliminating the race.
>>> 
>>> Move the call to multifd_save_cleanup() before joining for the
>>> migration thread so we release the semaphores before.
>>> 
>>> Signed-off-by: Fabiano Rosas <faro...@suse.de>
>>> ---
>>>  migration/migration.c |  4 +++-
>>>  migration/multifd.c   | 29 +++++++++++------------------
>>>  2 files changed, 14 insertions(+), 19 deletions(-)
>>> 
>>> diff --git a/migration/migration.c b/migration/migration.c
>>> index cca32c553c..52be20561b 100644
>>> --- a/migration/migration.c
>>> +++ b/migration/migration.c
>>> @@ -1300,6 +1300,9 @@ static void migrate_fd_cleanup(MigrationState *s)
>>>          QEMUFile *tmp;
>>>  
>>>          trace_migrate_fd_cleanup();
>>> +
>>> +        multifd_save_cleanup();
>>> +
>>>          qemu_mutex_unlock_iothread();
>>>          if (s->migration_thread_running) {
>>>              qemu_thread_join(&s->thread);
>>> @@ -1307,7 +1310,6 @@ static void migrate_fd_cleanup(MigrationState *s)
>>>          }
>>>          qemu_mutex_lock_iothread();
>>>  
>>> -        multifd_save_cleanup();
>>>          qemu_mutex_lock(&s->qemu_file_lock);
>>>          tmp = s->to_dst_file;
>>>          s->to_dst_file = NULL;
>>> diff --git a/migration/multifd.c b/migration/multifd.c
>>> index ec58c58082..9ca482cfbe 100644
>>> --- a/migration/multifd.c
>>> +++ b/migration/multifd.c
>>> @@ -527,6 +527,9 @@ void multifd_save_cleanup(void)
>>>  
>>>          if (p->running) {
>>>              qemu_thread_join(&p->thread);
>>> +        } else {
>>> +            qemu_sem_post(&p->sem_sync);
>>> +            qemu_sem_post(&multifd_send_state->channels_ready);
>>
>> I think relying on p->running to join the thread is already problematic.
>
> Why?


The channel holds resources that need to be freed by
multifd_save_cleanup(). We cannot reliably do so while the channel
itself is the one responsible for telling the main thread whether it is
done with those resources. 

p->running works fine for knowing "has the thread been created", so we
can use it to avoid joining if it was never created. But it is bad for
knowing "is the thread still running" because as soon as the channel
sets p->running=false, multifd_save_cleanup() could attempt to destroy
the p->mutex and the semaphores.

The bug reported in the other series was about this already:

[PATCH] migrate/multifd: fix coredump when the multifd thread cleanup
https://lore.kernel.org/r/20230621081826.3203053-1-zhangjiangu...@huawei.com

>> Now all threads are created with JOINABLE, so we must join them to release
>> the thread resources.  Clearing "running" at the end of the thread is
>> already wrong to me, because it means if the thread quits before the main
>> thread reaching here, we will not join the thread, and the thread resource
>> will be leaked.
>
> The bug is that the thread quits from other place.
> It is not different that forgetting to do a mutex_unlock().  It is an
> error that needs fixing.  
>
>> Here IMHO we should set p->running=true right before thread created,
>
> (that is basically what is done)
>
> Meaning of ->running is that multifd thread has started running.  it a
> false is that it is not running normally anymore.

The issue is that multifd_save_cleanup looks at p->running = false and
continues with the cleanup that destroys the mutex while the TLS thread
might be using it.

>
> thread function:
>
>> and never clear it.
>
> static void *multifd_send_thread(void *opaque)
> {
>     // No error can happens here
>
>     while (true) {
>     // No return command here, just breaks
>     }
>
> out:
>     // This can fail
>
>     qemu_mutex_lock(&p->mutex);
>     p->running = false;
>     qemu_mutex_unlock(&p->mutex);
>
>     /* this can't fail */
>
>     return NULL;
> }
>
>
> What running here means is that we don't need to "stop" this thread
> anymore.  That happens as soon as we get out of the loop.
>
>>  We may even want to rename it to p->thread_created?
>
> We can rename it, but I am not sure if it buys it too much.  Notice that
> we also mean that we have created the channel.
>
>>
>>>          }
>>>      }
>>>      for (i = 0; i < migrate_multifd_channels(); i++) {
>>> @@ -751,8 +754,6 @@ out:
>>>          assert(local_err);
>>>          trace_multifd_send_error(p->id);
>>>          multifd_send_terminate_threads(local_err);
>>> -        qemu_sem_post(&p->sem_sync);
>>> -        qemu_sem_post(&multifd_send_state->channels_ready);
>>>          error_free(local_err);
>>>      }
>>>  
>>> @@ -780,20 +781,15 @@ static void multifd_tls_outgoing_handshake(QIOTask 
>>> *task,
>>>  
>>>      if (!qio_task_propagate_error(task, &err)) {
>>>          trace_multifd_tls_outgoing_handshake_complete(ioc);
>>> -        if (multifd_channel_connect(p, ioc, &err)) {
>>> -            return;
>>> -        }
>>> +        multifd_channel_connect(p, ioc, NULL);
>>
>> Ignoring Error** is not good..
>>
>> I think you meant to say "it should never fail", then we should put
>> &error_abort.  Another cleaner way to do this is split the current
>> multifd_channel_connect() into tls and non-tls helpers, then we can call
>> the non-tls helpers here (which may not need an Error**).
>
> This code is really weird because it is (very) asynchronous.
> And TLS is even worse, because we need to do the equilent of two
> listen().
> Sniff.
>
>>> +    } else {
>>> +        /*
>>> +         * The multifd client could already be waiting to queue data,
>>> +         * so let it know that we didn't even start.
>>> +         */
>>> +        p->quit = true;
>>> +        trace_multifd_tls_outgoing_handshake_error(ioc, 
>>> error_get_pretty(err));
>>>      }
>>> -
>>> -    trace_multifd_tls_outgoing_handshake_error(ioc, error_get_pretty(err));
>>> -
>>> -    /*
>>> -     * Error happen, mark multifd_send_thread status as 'quit' although it
>>> -     * is not created, and then tell who pay attention to me.
>>> -     */
>>> -    p->quit = true;
>>> -    qemu_sem_post(&multifd_send_state->channels_ready);
>>> -    qemu_sem_post(&p->sem_sync);
>>>  }
>>>  
>>>  static void *multifd_tls_handshake_thread(void *opaque)
>>> @@ -862,9 +858,6 @@ static void 
>>> multifd_new_send_channel_cleanup(MultiFDSendParams *p,
>>>                                               QIOChannel *ioc, Error *err)
>>>  {
>>>       migrate_set_error(migrate_get_current(), err);
>>> -     /* Error happen, we need to tell who pay attention to me */
>>> -     qemu_sem_post(&multifd_send_state->channels_ready);
>>> -     qemu_sem_post(&p->sem_sync);
>>>       /*
>>>        * Although multifd_send_thread is not created, but main migration
>>>        * thread need to judge whether it is running, so we need to mark
>>> -- 
>>
>> I may still need some more time to digest your whole solution, currently
>> not very clear to me.  It may or may not be the patch's problem, though.
>>
>> But let me also share how I see this..  I think we should rework the
>> multifd thread model on channel establishment.
>>
>> Firstly, as I mentioned above, we must always join() the threads if it's
>> JOINABLE or the resource is leaked, afaict.  That's the first thing to fix.
>
> The problem that running is trying to fix is this:
> * we get an additional error while we are exiting due to any reason.
>
> So we end calling multifd_fd_cleanup() several times, and things get
> really weird.
>
> So what we really need is to be sure that:
> - we join each thread once and only once
>
>       https://linux.die.net/man/3/pthread_join
>
>       Joining with a thread that has previously been joined results in 
> undefined behavior.
>

I don't see how we would call multifd_save_cleanup() more than once. The
qemu_mutex_destroy() calls in the loop below would already assert if
called twice.

>> Then let's see when TLS is there and what we do: we'll create "two" threads
>> just for that, what's even worse:
>>
>>   - We'll create tls handshake thread in multifd_tls_channel_connect()
>>     first, setting &p->thread.
>>
>>   - Within the same thread, we do multifd_tls_outgoing_handshake() when
>>     handshake done -> multifd_channel_connect() -> we yet create the real
>>     multifd_send_thread(), setting &p->thread too.
>>
>> So AFAICT, the tls handshake thread is already leaked, got p->thread
>> overwritten by the new thread created by itself..
>>
>> I think we should fix this then.  I haven't figured the best way to do,
>> two things I can come up with now:
>>
>>   1) At least make the tls handshake thread detached.
>>
>>   2) Make the handshake done in multifd send thread directly; I don't yet
>>      see why we must create two threads..
>
> The way we listen() in sockets anynchronously. QIO channels are a
> mystery to me, and this was really weird.
>
> Using glib_main_loop() has lots of advantages.  Understanding how to
> listen asynchronously is not one of them.
>
>> Then assuming we have a clear model with all these threads issue fixed (no
>> matter whether we'd shrink 2N threads into N threads), then what we need to
>> do, IMHO, is making sure to join() all of them before destroying anything
>> (say, per-channel MultiFDSendParams).  Then when we destroy everything
>> safely, either mutex/sem/etc..  Because no one will race us anymore.
>
>> Would it make sense?
>
> I can't see why we need to join the threads created for tls.  That ones
> could be not joinable and just call it a day.

>From the perspective of this series, we just need to be sure
multifd_tls_outgoing_handshake() never runs in parallel with
multifd_save_cleanup(). Joining the thread would be one way to do that.

Re: [RFC PATCH 2/2] migration/multifd: Move semaphore release into main thread

Reply via email to