Peter Xu <pet...@redhat.com> writes: > On Thu, Nov 09, 2023 at 01:58:56PM -0300, Fabiano Rosas wrote: >> We cannot operate on the multifd semaphores outside of the multifd >> channel thread >> because multifd_save_cleanup() can run in parallel and >> attempt to destroy the mutexes, which causes an assert. >> >> Looking at the places where we use the semaphores aside from the >> migration thread, there's only the TLS handshake thread and the >> initial multifd_channel_connect() in the main thread. These are places >> where creating the multifd channel cannot fail, so releasing the >> semaphores at these places only serves the purpose of avoiding a >> deadlock when an error happens before the channel(s) have started, but >> after the migration thread has already called >> multifd_send_sync_main(). >> >> Instead of attempting to release the semaphores at those places, move >> the release into multifd_save_cleanup(). This puts the semaphore usage >> in the same thread that does the cleanup, eliminating the race. >> >> Move the call to multifd_save_cleanup() before joining for the >> migration thread so we release the semaphores before. >> >> Signed-off-by: Fabiano Rosas <faro...@suse.de> >> --- >> migration/migration.c | 4 +++- >> migration/multifd.c | 29 +++++++++++------------------ >> 2 files changed, 14 insertions(+), 19 deletions(-) >> >> diff --git a/migration/migration.c b/migration/migration.c >> index cca32c553c..52be20561b 100644 >> --- a/migration/migration.c >> +++ b/migration/migration.c >> @@ -1300,6 +1300,9 @@ static void migrate_fd_cleanup(MigrationState *s) >> QEMUFile *tmp; >> >> trace_migrate_fd_cleanup(); >> + >> + multifd_save_cleanup(); >> + >> qemu_mutex_unlock_iothread(); >> if (s->migration_thread_running) { >> qemu_thread_join(&s->thread); >> @@ -1307,7 +1310,6 @@ static void migrate_fd_cleanup(MigrationState *s) >> } >> qemu_mutex_lock_iothread(); >> >> - multifd_save_cleanup(); >> qemu_mutex_lock(&s->qemu_file_lock); >> tmp = s->to_dst_file; >> s->to_dst_file = NULL; >> diff --git a/migration/multifd.c b/migration/multifd.c >> index ec58c58082..9ca482cfbe 100644 >> --- a/migration/multifd.c >> +++ b/migration/multifd.c >> @@ -527,6 +527,9 @@ void multifd_save_cleanup(void) >> >> if (p->running) { >> qemu_thread_join(&p->thread); >> + } else { >> + qemu_sem_post(&p->sem_sync); >> + qemu_sem_post(&multifd_send_state->channels_ready); > > I think relying on p->running to join the thread is already problematic. >
Absolutely. I've already complained about this in another thread. > Now all threads are created with JOINABLE, so we must join them to release > the thread resources. Clearing "running" at the end of the thread is > already wrong to me, because it means if the thread quits before the main > thread reaching here, we will not join the thread, and the thread resource > will be leaked. > > Here IMHO we should set p->running=true right before thread created, and > never clear it. We may even want to rename it to p->thread_created? > Ok, let me do that. I already have some patches touching multifd_new_send_channel_async() due to fixed-ram. >> } >> } >> for (i = 0; i < migrate_multifd_channels(); i++) { >> @@ -751,8 +754,6 @@ out: >> assert(local_err); >> trace_multifd_send_error(p->id); >> multifd_send_terminate_threads(local_err); >> - qemu_sem_post(&p->sem_sync); >> - qemu_sem_post(&multifd_send_state->channels_ready); >> error_free(local_err); >> } >> >> @@ -780,20 +781,15 @@ static void multifd_tls_outgoing_handshake(QIOTask >> *task, >> >> if (!qio_task_propagate_error(task, &err)) { >> trace_multifd_tls_outgoing_handshake_complete(ioc); >> - if (multifd_channel_connect(p, ioc, &err)) { >> - return; >> - } >> + multifd_channel_connect(p, ioc, NULL); > > Ignoring Error** is not good.. > > I think you meant to say "it should never fail", then we should put > &error_abort. Another cleaner way to do this is split the current > multifd_channel_connect() into tls and non-tls helpers, then we can call > the non-tls helpers here (which may not need an Error**). I attempted the split and it looked awkward, so I let go. But I agree about the Error. >> + } else { >> + /* >> + * The multifd client could already be waiting to queue data, >> + * so let it know that we didn't even start. >> + */ >> + p->quit = true; >> + trace_multifd_tls_outgoing_handshake_error(ioc, >> error_get_pretty(err)); >> } >> - >> - trace_multifd_tls_outgoing_handshake_error(ioc, error_get_pretty(err)); >> - >> - /* >> - * Error happen, mark multifd_send_thread status as 'quit' although it >> - * is not created, and then tell who pay attention to me. >> - */ >> - p->quit = true; >> - qemu_sem_post(&multifd_send_state->channels_ready); >> - qemu_sem_post(&p->sem_sync); >> } >> >> static void *multifd_tls_handshake_thread(void *opaque) >> @@ -862,9 +858,6 @@ static void >> multifd_new_send_channel_cleanup(MultiFDSendParams *p, >> QIOChannel *ioc, Error *err) >> { >> migrate_set_error(migrate_get_current(), err); >> - /* Error happen, we need to tell who pay attention to me */ >> - qemu_sem_post(&multifd_send_state->channels_ready); >> - qemu_sem_post(&p->sem_sync); >> /* >> * Although multifd_send_thread is not created, but main migration >> * thread need to judge whether it is running, so we need to mark >> -- > > I may still need some more time to digest your whole solution, currently > not very clear to me. It may or may not be the patch's problem, > though. The core assumption of this patch is that we currently only need to release the semaphores because the multifd_send_sync_main() is allowed to execute even before the multifd channels have started. If creation failed to even start, for instance because of a TLS handshake failure, the migration thread will hang waiting for channels_ready (or sem_sync). Then there are two premises: - when an error happens, multifd_save_cleanup() is always called; - a hanging migration thread (at multifd_send_sync_main) will not stop the main thread from reaching multifd_save_cleanup. If this holds, then it's safe to release the semaphores right before destroying them at multifd_save_cleanup. > But let me also share how I see this.. I think we should rework the > multifd thread model on channel establishment. > > Firstly, as I mentioned above, we must always join() the threads if it's > JOINABLE or the resource is leaked, afaict. That's the first thing to fix. I think we historically stumbled upon the fact that qemu_thread_join() is not the same as pthread_join(). The former takes a pointer and is not safe to call with a NULL QemuThread. That seems to be the reason for the p->running check before it. > Then let's see when TLS is there and what we do: we'll create "two" threads > just for that, what's even worse: > > - We'll create tls handshake thread in multifd_tls_channel_connect() > first, setting &p->thread. > > - Within the same thread, we do multifd_tls_outgoing_handshake() when > handshake done -> multifd_channel_connect() -> we yet create the real > multifd_send_thread(), setting &p->thread too. Hmm good point, we're calling qio_task_complete() from within the thread, that's misleading. I believe using qio_task_run_in_thread() would put the completion callback in the main thread, right? That would look a bit better I think because we could then join the handshake thread before multifd_channel_connect(). > So AFAICT, the tls handshake thread is already leaked, got p->thread > overwritten by the new thread created by itself.. > > I think we should fix this then. I haven't figured the best way to do, > two things I can come up with now: > > 1) At least make the tls handshake thread detached. > > 2) Make the handshake done in multifd send thread directly; I don't yet > see why we must create two threads.. I'm (a little bit) against this. It's nice to know that a multifdsend_# thread will only be doing IO and no other task. I have the same concern about putting zero page detection in the multifd thread. > Then assuming we have a clear model with all these threads issue fixed (no > matter whether we'd shrink 2N threads into N threads), then what we need to > do, IMHO, is making sure to join() all of them before destroying anything > (say, per-channel MultiFDSendParams). Then when we destroy everything > safely, either mutex/sem/etc.. Because no one will race us anymore. This doesn't address the race. There's a data dependency between the multifd channels and the migration thread around the channels_ready semaphore. So we cannot join the migration thread because it could be stuck waiting for the semaphore, which means we cannot join+cleanup the channel thread because the semaphore is still being used. That's why I'm moving the semaphores post to the point where we join the thread. Reading your email, now I think that should be: qemu_sem_post(&p->sem_sync); qemu_sem_post(&multifd_send_state->channels_ready); qemu_thread_join(&p->thread); Fundamentally, the issue is that we do create the multifd threads before the migration thread, but they might not be ready before the migration thread needs to use them.