Peter Xu <pet...@redhat.com> wrote: > On Tue, Sep 26, 2023 at 06:01:02PM +0800, Li Zhijian wrote: >> Migration over RDMA failed since >> commit: 294e5a4034 ("multifd: Only flush once each full round of memory") >> with erors: >> qemu-system-x86_64: rdma: Too many requests in this message >> (3638950032).Bailing. >> >> migration with RDMA is different from tcp. RDMA has its own control >> message, and all traffic between RDMA_CONTROL_REGISTER_REQUEST and >> RDMA_CONTROL_REGISTER_FINISHED should not be disturbed. >> >> find_dirty_block() will be called during RDMA_CONTROL_REGISTER_REQUEST >> and RDMA_CONTROL_REGISTER_FINISHED, it will send a extra traffic( >> RAM_SAVE_FLAG_MULTIFD_FLUSH) to destination and cause migration to fail >> even though multifd is disabled. >> >> This change make migrate_multifd_flush_after_each_section() return true >> when multifd is disabled, that also means RAM_SAVE_FLAG_MULTIFD_FLUSH >> will not be sent to destination any more when multifd is disabled. >> >> Fixes: 294e5a4034 ("multifd: Only flush once each full round of memory") >> CC: Fabiano Rosas <faro...@suse.de> >> Signed-off-by: Li Zhijian <lizhij...@fujitsu.com> >> --- >> >> V2: put that check at the entry of >> migrate_multifd_flush_after_each_section() # Peter > > When seeing this I notice my suggestion wasn't ideal either, as we rely on > both multifd_send_sync_main() and multifd_recv_sync_main() be no-op when > !multifd. > > For the long term, we should not call multifd functions at all, if multifd > is not enabled..
Agreed. Send a different patch that makes this clear. > Reviewed-by: Peter Xu <pet...@redhat.com>