multifd: Convert multifd_send()::next_channel to atomic

Maciej S. Szmigiero Mon, 02 Sep 2024 13:12:24 -0700

On 30.08.2024 20:13, Fabiano Rosas wrote:

"Maciej S. Szmigiero" <m...@maciej.szmigiero.name> writes:

From: "Maciej S. Szmigiero" <maciej.szmigi...@oracle.com>

This is necessary for multifd_send() to be able to be called
from multiple threads.

Signed-off-by: Maciej S. Szmigiero <maciej.szmigi...@oracle.com>
---
  migration/multifd.c | 24 ++++++++++++++++++------
  1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index d5a8e5a9c9b5..b25789dde0b3 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -343,26 +343,38 @@ bool multifd_send(MultiFDSendData **send_data)
          return false;
      }

- /* We wait here, until at least one channel is ready */

-    qemu_sem_wait(&multifd_send_state->channels_ready);
-
      /*
       * next_channel can remain from a previous migration that was
       * using more channels, so ensure it doesn't overflow if the
       * limit is lower now.
       */
-    next_channel %= migrate_multifd_channels();
-    for (i = next_channel;; i = (i + 1) % migrate_multifd_channels()) {
+    i = qatomic_load_acquire(&next_channel);
+    if (unlikely(i >= migrate_multifd_channels())) {
+        qatomic_cmpxchg(&next_channel, i, 0);
+    }


Do we still need this? It seems not, because the mod down below would
already truncate to a value less than the number of channels. We don't
need it to start at 0 always, the channels are equivalent.


The "modulo" operation below forces i_next to be in the proper range,
not i.

If the qatomic_cmpxchg() ends up succeeding then we use the (now out of
bounds) i value to index multifd_send_state->params[].

+
+    /* We wait here, until at least one channel is ready */
+    qemu_sem_wait(&multifd_send_state->channels_ready);
+
+    while (true) {
+        int i_next;
+
          if (multifd_send_should_exit()) {
              return false;
          }
+
+        i = qatomic_load_acquire(&next_channel);
+        i_next = (i + 1) % migrate_multifd_channels();
+        if (qatomic_cmpxchg(&next_channel, i, i_next) != i) {
+            continue;
+        }


Say channel 'i' is the only one that's idle. What's stopping the other
thread(s) to race at this point and loop around to the same index?


See the reply below.

+
          p = &multifd_send_state->params[i];
          /*
           * Lockless read to p->pending_job is safe, because only multifd
           * sender thread can clear it.
           */
          if (qatomic_read(&p->pending_job) == false) {


With the cmpxchg your other patch adds here, then the race I mentioned
above should be harmless. But we'd need to bring that code into this
patch.


You're right - the sender code with this patch alone isn't thread safe
yet but this commit is only literally about "converting
multifd_send()::next_channel to atomic".

At the time of this patch there aren't any multifd_send() calls from
multiple threads, and the commit that introduces such possible call
site (multifd_queue_device_state()) also modifies multifd_send()
to be fully thread safe by introducing p->pending_job_preparing.

Thanks,
Maciej

Re: [PATCH v2 10/17] migration/multifd: Convert multifd_send()::next_channel to atomic

Reply via email to