Manish, On Thu, Nov 03, 2022 at 11:47:51PM +0530, manish.mishra wrote: > Yes, but if we try to read early on main channel with tls enabled case it is > an issue. Sorry i may not have put above comment cleary. I will try to put > scenario step wise. > 1. main channel is created and tls handshake is done for main channel. > 2. Destionation side tries to read magic early on main channel in > migration_ioc_process_incoming but it is not yet sent by source. > 3. Source has written magic to main channel file buffer but it is not yet > flushed, it is flushed first time in ram_save_setup, i mean data is sent on > channel only if qemu file buffer is full or explicitly flushed. > 4. Source side blocks on multifd_send_sync_main in ram_save_setup before > flushing qemu file. But multifd_send_sync_main is blocked for sem_sync until > handshake is done for multiFD channels. > 5. Destination side is still waiting for reading magic on main channel, so > unless we return from migration_ioc_process_incoming we can not accept new > channel, so handshake of multiFD channel is blocked. > 6. So basically source is blocked on multiFD channels handshake before > sending data on main channel, but destination is blocked waiting for data > before it can acknowledge multiFD channels and do handshake, so it kind of > creates a deadlock situation.
Why is this issue only happening with TLS? It sounds like it'll happen as long as multifd enabled. I'm also thinking whether we should flush in qemu_savevm_state_header() so at least upgraded src qemu will always flush the headers if it never hurts. -- Peter Xu