Peter Xu <pet...@redhat.com> writes: > On Thu, Jul 11, 2024 at 11:12:09AM -0300, Fabiano Rosas wrote: >> What about the QEMUFile traffic? There's an iov in there. I have been >> thinking of replacing some of qemu-file.c guts with calls to >> multifd. Instead of several qemu_put_byte() we could construct an iov >> and give it to multifd for transfering, call multifd_sync at the end and >> get rid of the QEMUFile entirely. I don't have that completely laid out >> at the moment, but I think it should be possible. I get concerned about >> making assumptions on the types of data we're ever going to want to >> transmit. I bet someone thought in the past that multifd would never be >> used for anything other than ram. > > Hold on a bit.. there're two things I want to clarity with you. > > Firstly, qemu_put_byte() has buffering on f->buf[]. Directly changing them > to iochannels may regress performance. I never checked, but I would assume > some buffering will be needed for small chunk of data even with iochannels. > > Secondly, why multifd has things to do with this? What you're talking > about is more like the rework of qemufile->iochannel thing to me, and IIUC > that doesn't yet involve multifd. For many of such conversions, it'll > still be operating on the main channel, which is not the multifd channels. > What matters might be about what's in your mind to be put over multifd > channels there. > >> >> > >> > I wonder why handshake needs to be done per-thread. I was naturally >> > thinking the handshake should happen sequentially, talking over everything >> > including multifd. >> >> Well, it would still be thread based. Just that it would be 1 thread and >> it would not be managed by multifd. I don't see the point. We could make >> everything be multifd-based. Any piece of data that needs to reach the >> other side of the migration could be sent through multifd, no? > > Hmm.... yes we can. But what do we gain from it, if we know it'll be a few > MBs in total? There ain't a lot of huge stuff to move, it seems to me. > >> >> Also, when you say "per-thread", that's the model we're trying to get >> away from. There should be nothing "per-thread", the threads just >> consume the data produced by the clients. Anything "per-thread" that is >> not strictly related to the thread model should go away. For instance, >> p->page_size, p->page_count, p->write_flags, p->flags, etc. None of >> these should be in MultiFDSendParams. That thing should be (say) >> MultifdChannelState and contain only the semaphores and control flags >> for the threads. >> >> It would be nice if we could once and for all have a model that can >> dispatch data transfers without having to fiddle with threading all the >> time. Any time someone wants to do something different in the migration >> code, there it goes a random qemu_create_thread() flying around. > > That's exactly what I want to avoid. Not all things will need a thread, > only performance relevant ones. > > So now we have multifd threads, they're for IO throughputs: if we want to > push a fast NIC, that's the only way to go. Anything wants to push that > NIC, should use multifd. > > Then it turns out we want more concurrency, it's about VFIO save()/load() > of the kenrel drivers and it can block. Same to other devices that can > take time to save()/load() if it can happen concurrently in the future. I > think that's the reason why I suggested the VFIO solution to provide a > generic concept of thread pool so it services a generic purpose, and can be > reused in the future. > > I hope that'll stop anyone else on migration to create yet another thread > randomly, and I definitely don't like that either. I would _suspect_ the > next one to come as such is TDX.. I remember at least in the very initial > proposal years ago, TDX migration involves its own "channel" to migrate, > migration.c may not even know where is that channel. We'll see. > > [...] > >> > One thing to mention is that when with an union we may probably need to get >> > rid of multifd_send_state->pages already. >> >> Hehe, please don't do this like "oh, by the way...". This is a major >> pain point. I've been complaining about that "holding of client data" >> since the fist time I read that code. So if you're going to propose >> something, it needs to account for that. > > The client puts something into a buffer (SendData), then it delivers it to > multifd (who silently switches the buffer). After enqueued, the client > assumes the buffer is sent and reusable again. > > It looks pretty common to me, what is the concern within the procedure? > What's the "holding of client data" issue? >
v2 is ready, but unfortunately this approach doesn't work. When client A takes the payload, it fills it with it's data, which may include allocating memory. MultiFDPages_t does that for the offset. This means we need a round of free/malloc at every packet sent. For every client and every allocation they decide to do.