On Thu, Apr 18, 2024 at 04:02:49PM -0400, Peter Xu wrote: > On Thu, Apr 18, 2024 at 08:14:15PM +0200, Maciej S. Szmigiero wrote: > > I think one of the reasons for these results is that mixed (RAM + device > > state) multifd channels participate in the RAM sync process > > (MULTIFD_FLAG_SYNC) whereas device state dedicated channels don't. > > Firstly, I'm wondering whether we can have better names for these new > hooks. Currently (only comment on the async* stuff): > > - complete_precopy_async > - complete_precopy > - complete_precopy_async_wait > > But perhaps better: > > - complete_precopy_begin > - complete_precopy > - complete_precopy_end > > ? > > As I don't see why the device must do something with async in such hook. > To me it's more like you're splitting one process into multiple, then > begin/end sounds more generic. > > Then, if with that in mind, IIUC we can already split ram_save_complete() > into >1 phases too. For example, I would be curious whether the performance > will go back to normal if we offloading multifd_send_sync_main() into the > complete_precopy_end(), because we really only need one shot of that, and I > am quite surprised it already greatly affects VFIO dumping its own things. > > I would even ask one step further as what Dan was asking: have you thought > about dumping VFIO states via multifd even during iterations? Would that > help even more than this series (which IIUC only helps during the blackout > phase)?
To dump during RAM iteration, the VFIO device will need to have dirty tracking and iterate on its state, because the guest CPUs will still be running potentially changing VFIO state. That seems impractical in the general case. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|