On Fri, Aug 8, 2025 at 3:02 PM Lukas Straub <lukasstra...@web.de> wrote:
> On Fri, 8 Aug 2025 10:36:24 +0800 > Yong Huang <yong.hu...@smartx.com> wrote: > > > On Thu, Aug 7, 2025 at 5:36 PM Lukas Straub <lukasstra...@web.de> wrote: > > > > > On Thu, 7 Aug 2025 10:41:17 +0800 > > > yong.hu...@smartx.com wrote: > > > > > > > From: Hyman Huang <yong.hu...@smartx.com> > > > > > > > > When there are network issues like missing TCP ACKs on the send > > > > side during the multifd live migration. At the send side, the error > > > > "Connection timed out" is thrown out and source QEMU process stop > > > > sending data, at the receive side, The IO-channels may be blocked > > > > at recvmsg() and thus the main loop gets stuck and fails to respond > > > > to QMP commands consequently. > > > > ... > > > > > > Hi Hyman Huang, > > > > > > Have you tried the 'yank' command to shutdown the sockets? It exactly > > > meant to recover from hangs and should solve your issue. > > > > > > > https://www.qemu.org/docs/master/interop/qemu-qmp-ref.html#yank-feature > > > > > > Thanks for the comment and advice. > > > > Let me give more details about the migration state when the issue > happens: > > > > On the source side, libvirt has already aborted the migration job: > > > > $ virsh domjobinfo fdecd242-f278-4308-8c3b-46e144e55f63 > > Job type: Failed > > Operation: Outgoing migration > > > > QMP query-yank shows that there is no migration yank instance: > > > > $ virsh qemu-monitor-command fdecd242-f278-4308-8c3b-46e144e55f63 > > '{"execute":"query-yank"}' --pretty > > { > > "return": [ > > { > > "type": "chardev", > > "id": "charmonitor" > > }, > > { > > "type": "chardev", > > "id": "charchannel0" > > }, > > { > > "type": "chardev", > > "id": "libvirt-2-virtio-format" > > } > > ], > > "id": "libvirt-5217" > > } > > You are supposed to run it on the destination side, there the migration > yank instance should be present if qemu hangs in the migration code. > > Also, you need to execute it as an out-of-band command to bypass the > main loop. Like this: > > '{"exec-oob": "yank", "id": "yank0", "arguments": {"instances": [ {"type": > "migration"} ] } }' In our case, Libvirt's operation about the VM on the destination side has been blocked by Migration JOB: $ virsh qemu-monitor-command fdecd242-f278-4308-8c3b-46e144e55f63 '{"query-commands"}' --pretty error: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainMigratePrepare3Params) Using Libvirt to issue the yank command can not be taken into account. > > > I'm not sure if libvirt can do that, maybe you need to add an > additional qmp socket and do it outside of libvirt. Note that you need > to enable the oob feature during qmp negotiation, like this: > > '{ "execute": "qmp_capabilities", "arguments": { "enable": [ "oob" ] } }' No, I checked Libvirt's source code and figured out that when the QEMU monitor is initialized, Libvirt by default disables the OOB. Therefore, perhaps we can first enable the OOB and add the yank capability to Libvirt then adding the yank logic to the necessary path—in our instance, the migration code: qemuMigrationDstFinish: if (retcode != 0) { /* Check for a possible error on the monitor in case Finish was called * earlier than monitor EOF handler got a chance to process the error */ qemuDomainCheckMonitor(driver, vm, QEMU_ASYNC_JOB_MIGRATION_IN); goto endjob; } > > Regards, > Lukas Straub > > > > > The libvirt migration job is stuck as the following backtrace shows; it > > shows that migration is waiting for the "Finish" RPC on the destination > > side to return. > > > > ... > > > > IMHO, the key reason for the issue is that QEMU fails to run the main > loop > > and fails to respond to QMP, which is not what we usually expected. > > > > Giving the Libvirt a window of time to issue a QMP and kill the VM is the > > ideal solution for this issue; this provides an automatic method. > > > > I do not dig the yank feature, perhaps it is helpful, but only manually? > > > > After all, these two options are not exclusive of one another, I think. > > > > > > > > > > Best regards, > > > Lukas Straub > > > > > > > Thanks, > > Yong > > > > -- Best regards