"Maciej S. Szmigiero" <m...@maciej.szmigiero.name> writes:

> From: "Maciej S. Szmigiero" <maciej.szmigi...@oracle.com>
>
> This is necessary for multifd_send() to be able to be called
> from multiple threads.
>
> Signed-off-by: Maciej S. Szmigiero <maciej.szmigi...@oracle.com>
> ---
>  migration/multifd.c | 24 ++++++++++++++++++------
>  1 file changed, 18 insertions(+), 6 deletions(-)
>
> diff --git a/migration/multifd.c b/migration/multifd.c
> index d5a8e5a9c9b5..b25789dde0b3 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -343,26 +343,38 @@ bool multifd_send(MultiFDSendData **send_data)
>          return false;
>      }
>  
> -    /* We wait here, until at least one channel is ready */
> -    qemu_sem_wait(&multifd_send_state->channels_ready);
> -
>      /*
>       * next_channel can remain from a previous migration that was
>       * using more channels, so ensure it doesn't overflow if the
>       * limit is lower now.
>       */
> -    next_channel %= migrate_multifd_channels();
> -    for (i = next_channel;; i = (i + 1) % migrate_multifd_channels()) {
> +    i = qatomic_load_acquire(&next_channel);
> +    if (unlikely(i >= migrate_multifd_channels())) {
> +        qatomic_cmpxchg(&next_channel, i, 0);
> +    }

Do we still need this? It seems not, because the mod down below would
already truncate to a value less than the number of channels. We don't
need it to start at 0 always, the channels are equivalent.

> +
> +    /* We wait here, until at least one channel is ready */
> +    qemu_sem_wait(&multifd_send_state->channels_ready);
> +
> +    while (true) {
> +        int i_next;
> +
>          if (multifd_send_should_exit()) {
>              return false;
>          }
> +
> +        i = qatomic_load_acquire(&next_channel);
> +        i_next = (i + 1) % migrate_multifd_channels();
> +        if (qatomic_cmpxchg(&next_channel, i, i_next) != i) {
> +            continue;
> +        }

Say channel 'i' is the only one that's idle. What's stopping the other
thread(s) to race at this point and loop around to the same index?

> +
>          p = &multifd_send_state->params[i];
>          /*
>           * Lockless read to p->pending_job is safe, because only multifd
>           * sender thread can clear it.
>           */
>          if (qatomic_read(&p->pending_job) == false) {

With the cmpxchg your other patch adds here, then the race I mentioned
above should be harmless. But we'd need to bring that code into this
patch.

> -            next_channel = (i + 1) % migrate_multifd_channels();
>              break;
>          }
>      }

Reply via email to