On 30.08.2024 20:13, Fabiano Rosas wrote:
"Maciej S. Szmigiero" <m...@maciej.szmigiero.name> writes:
From: "Maciej S. Szmigiero" <maciej.szmigi...@oracle.com>
This is necessary for multifd_send() to be able to be called
from multiple threads.
Signed-off-by: Maciej S. Szmigiero <maciej.szmigi...@oracle.com>
---
migration/multifd.c | 24 ++++++++++++++++++------
1 file changed, 18 insertions(+), 6 deletions(-)
diff --git a/migration/multifd.c b/migration/multifd.c
index d5a8e5a9c9b5..b25789dde0b3 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -343,26 +343,38 @@ bool multifd_send(MultiFDSendData **send_data)
return false;
}
- /* We wait here, until at least one channel is ready */
- qemu_sem_wait(&multifd_send_state->channels_ready);
-
/*
* next_channel can remain from a previous migration that was
* using more channels, so ensure it doesn't overflow if the
* limit is lower now.
*/
- next_channel %= migrate_multifd_channels();
- for (i = next_channel;; i = (i + 1) % migrate_multifd_channels()) {
+ i = qatomic_load_acquire(&next_channel);
+ if (unlikely(i >= migrate_multifd_channels())) {
+ qatomic_cmpxchg(&next_channel, i, 0);
+ }
Do we still need this? It seems not, because the mod down below would
already truncate to a value less than the number of channels. We don't
need it to start at 0 always, the channels are equivalent.
The "modulo" operation below forces i_next to be in the proper range,
not i.
If the qatomic_cmpxchg() ends up succeeding then we use the (now out of
bounds) i value to index multifd_send_state->params[].
+
+ /* We wait here, until at least one channel is ready */
+ qemu_sem_wait(&multifd_send_state->channels_ready);
+
+ while (true) {
+ int i_next;
+
if (multifd_send_should_exit()) {
return false;
}
+
+ i = qatomic_load_acquire(&next_channel);
+ i_next = (i + 1) % migrate_multifd_channels();
+ if (qatomic_cmpxchg(&next_channel, i, i_next) != i) {
+ continue;
+ }
Say channel 'i' is the only one that's idle. What's stopping the other
thread(s) to race at this point and loop around to the same index?
See the reply below.
+
p = &multifd_send_state->params[i];
/*
* Lockless read to p->pending_job is safe, because only multifd
* sender thread can clear it.
*/
if (qatomic_read(&p->pending_job) == false) {
With the cmpxchg your other patch adds here, then the race I mentioned
above should be harmless. But we'd need to bring that code into this
patch.
You're right - the sender code with this patch alone isn't thread safe
yet but this commit is only literally about "converting
multifd_send()::next_channel to atomic".
At the time of this patch there aren't any multifd_send() calls from
multiple threads, and the commit that introduces such possible call
site (multifd_queue_device_state()) also modifies multifd_send()
to be fully thread safe by introducing p->pending_job_preparing.
Thanks,
Maciej