On Tue, Sep 24, 2024 at 08:25:22AM +0000, Yuchen wrote: > > > > -----邮件原件----- > > 发件人: Daniel P. Berrangé <berra...@redhat.com> > > 发送时间: 2024年9月24日 0:59 > > 收件人: yuchen (CCSPL) <yu.c...@h3c.com> > > 抄送: Peter Xu <pet...@redhat.com>; faro...@suse.de; > > qemu-devel@nongnu.org > > 主题: Re: 回复: [PATCH] migration/multifd: receive channel socket needs to > > be set to non-blocking > > > > On Mon, Sep 23, 2024 at 01:33:13AM +0000, Yuchen wrote: > > > > > > > > > > -----邮件原件----- > > > > 发件人: Peter Xu <pet...@redhat.com> > > > > 发送时间: 2024年9月20日 23:53 > > > > 收件人: yuchen (CCSPL) <yu.c...@h3c.com> > > > > 抄送: faro...@suse.de; qemu-devel@nongnu.org > > > > 主题: Re: [PATCH] migration/multifd: receive channel socket needs to > > > > be set to non-blocking > > > > > > > > On Fri, Sep 20, 2024 at 10:05:42AM +0000, Yuchen wrote: > > > > > When the migration network is disconnected, the source qemu can > > > > > exit normally with an error, but the destination qemu is always > > > > > blocked in recvmsg(), causes the destination qemu main thread to be > > blocked. > > > > > > > > > > The destination qemu block stack: > > > > > Thread 13 (Thread 0x7f0178bfa640 (LWP 1895906) "multifdrecv_6"): > > > > > #0 0x00007f041b5af56f in recvmsg () > > > > > #1 0x000055573ebd0b42 in qio_channel_socket_readv > > > > > #2 0x000055573ebce83f in qio_channel_readv > > > > > #3 qio_channel_readv_all_eof > > > > > #4 0x000055573ebce909 in qio_channel_readv_all > > > > > #5 0x000055573eaa1b1f in multifd_recv_thread > > > > > #6 0x000055573ec2f0b9 in qemu_thread_start > > > > > #7 0x00007f041b52bf7a in start_thread > > > > > #8 0x00007f041b5ae600 in clone3 > > > > > > > > > > Thread 1 (Thread 0x7f0410c62240 (LWP 1895156) "kvm"): > > > > > #0 0x00007f041b528ae2 in __futex_abstimed_wait_common () > > > > > #1 0x00007f041b5338b8 in __new_sem_wait_slow64.constprop.0 > > > > > #2 0x000055573ec2fd34 in qemu_sem_wait (sem=0x555742b5a4e0) > > > > > #3 0x000055573eaa2f09 in multifd_recv_sync_main () > > > > > #4 0x000055573e7d590d in ram_load_precopy > > > > (f=f@entry=0x555742291c20) > > > > > #5 0x000055573e7d5cbf in ram_load (opaque=<optimized out>, > > > > > version_id=<optimized out>, f=0x555742291c20) > > > > > #6 ram_load_entry (f=0x555742291c20, opaque=<optimized out>, > > > > > version_id=<optimized out>) > > > > > #7 0x000055573ea932e7 in qemu_loadvm_section_part_end > > > > > (mis=0x555741136c00, f=0x555742291c20) > > > > > #8 qemu_loadvm_state_main (f=f@entry=0x555742291c20, > > > > > mis=mis@entry=0x555741136c00) > > > > > #9 0x000055573ea94418 in qemu_loadvm_state (f=0x555742291c20, > > > > > mode=mode@entry=VMS_MIGRATE) > > > > > #10 0x000055573ea88be1 in process_incoming_migration_co > > > > > (opaque=<optimized out>) > > > > > #11 0x000055573ec43d13 in coroutine_trampoline (i0=<optimized > > > > > out>, i1=<optimized out>) > > > > > #12 0x00007f041b4f5d90 in ?? () from target:/usr/lib64/libc.so.6 > > > > > #13 0x00007ffc11890270 in ?? () > > > > > #14 0x0000000000000000 in ?? () > > > > > > > > > > Setting the receive channel to non-blocking can solve the problem. > > > > > > > > Multifd threads are real threads and there's no coroutine, I'm > > > > slightly confused why it needs to use nonblock. > > > > > > > > Why recvmsg() didn't get kicked out when disconnect? Is it a > > > > generic Linux kernel are you using? > > > > > > > My steps to reproduce: > > > ifdown migration network, or disable migration network using iptables. > > > The probability of recurrence of these two methods is very high. > > > > > > My test environment uses is linux-5.10.136. > > > > > > multifd thread block in kernel: > > > # cat /proc/3416190/stack > > > [<0>] wait_woken+0x43/0x80 > > > [<0>] sk_wait_data+0x123/0x140 > > > [<0>] tcp_recvmsg+0x4f8/0xa50 > > > [<0>] inet6_recvmsg+0x5e/0x120 > > > [<0>] ____sys_recvmsg+0x87/0x180 > > > [<0>] ___sys_recvmsg+0x82/0x110 > > > [<0>] __sys_recvmsg+0x56/0xa0 > > > [<0>] do_syscall_64+0x3d/0x80 > > > [<0>] entry_SYSCALL_64_after_hwframe+0x61/0xc6 > > > > > > > I wonder whether that's the expected behavior for sockets. E.g., we > > > > do have multifd/cancel test (test_multifd_tcp_cancel) and I think > > > > that runs this path too with it always in block mode as of now.. > > > > > > > My previous statement may not be accurate. The migration network socket is > > not disconnected. > > > I use ifdown or iptables to simulate the network card failure. > > > Because the TCP connection was not disconnected, so recvmsg() was > > blocked. > > > > How long did you wait after doing ifdown ? TCP is intended to wait if > > I waited about 15 minutes, the source qemu migration threads quit, but > the destination qemu migration threads is still there. > > > there is an interruption.... only eventually after relevant TCP timeouts > > are hit, it > > will terminate the connection. QEMU shouldn't proactively give up if the > > TCP > > conn is still in an active state as reported by the kernel, even if traffic > > isn't > > currently flowing. > > > > > > Daniel, I agree with what you said, But in fact, the destination migration > connection is not disconnected > and is in the close wait state. > > The source qemu process lsof and top: > # lsof -p 384509 > ... > kvm 384509 root 112u sock 0,8 0t0 157321811 > protocol: TCP > kvm 384509 root 113u sock 0,8 0t0 157321813 > protocol: TCP > kvm 384509 root 114u sock 0,8 0t0 157321815 > protocol: TCP > kvm 384509 root 115u sock 0,8 0t0 157321817 > protocol: TCP > kvm 384509 root 116u sock 0,8 0t0 157321819 > protocol: TCP > kvm 384509 root 117u sock 0,8 0t0 157321821 > protocol: TCP > kvm 384509 root 118u sock 0,8 0t0 157321823 > protocol: TCP > kvm 384509 root 119u sock 0,8 0t0 157321825 > protocol: TCP > > # top -H -p 384509 > top - 15:10:22 up 5 days, 18:54, 3 users, load average: 5.16, 4.61, 4.50 > Threads: 8 total, 3 running, 5 sleeping, 0 stopped, 0 zombie > %Cpu(s): 5.2 us, 5.2 sy, 0.0 ni, 89.3 id, 0.0 wa, 0.1 hi, 0.1 si, 0.0 > st > MiB Mem : 128298.7 total, 41490.2 free, 89470.2 used, 2168.0 buff/cache > MiB Swap: 42922.0 total, 42910.4 free, 11.6 used. 38828.5 avail Mem > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 384596 root 20 0 11.9g 93516 40112 R 98.7 0.1 261:13.24 CPU > 1/KVM > 384595 root 20 0 11.9g 93516 40112 R 98.0 0.1 56:44.31 CPU > 0/KVM > 384509 root 20 0 11.9g 93516 40112 R 1.3 0.1 7:38.73 kvm > 384563 root 20 0 11.9g 93516 40112 S 0.0 0.1 0:00.05 kvm > 384598 root 20 0 11.9g 93516 40112 S 0.0 0.1 0:01.00 > vnc_worker > 1544593 root 20 0 11.9g 93516 40112 S 0.0 0.1 0:00.00 worker > > The destination qemu process lsof and top: > # lsof -p 3236693 > kvm 3236693 root 29u IPv6 159227758 0t0 TCP > node18:49156->2.2.2.6:41880 (CLOSE_WAIT) > kvm 3236693 root 30u IPv6 159227759 0t0 TCP > node18:49156->2.2.2.6:41890 (ESTABLISHED) > kvm 3236693 root 31u IPv6 159227760 0t0 TCP > node18:49156->2.2.2.6:41902 (ESTABLISHED) > kvm 3236693 root 32u IPv6 159227762 0t0 TCP > node18:49156->2.2.2.6:41912 (ESTABLISHED) > kvm 3236693 root 33u IPv6 159227761 0t0 TCP > node18:49156->2.2.2.6:41904 (ESTABLISHED) > kvm 3236693 root 34u IPv6 159227763 0t0 TCP > node18:49156->2.2.2.6:41918 (ESTABLISHED) > kvm 3236693 root 35u IPv6 159227764 0t0 TCP > node18:49156->2.2.2.6:41924 (ESTABLISHED) > kvm 3236693 root 36u IPv6 159227765 0t0 TCP > node18:49156->2.2.2.6:41934 (ESTABLISHED) > kvm 3236693 root 37u IPv6 159227766 0t0 TCP > node18:49156->2.2.2.6:41942 (ESTABLISHED) > > # top -H -p 3236693 > top - 15:09:25 up 5 days, 19:12, 2 users, load average: 0.63, 0.68, 0.89 > Threads: 15 total, 0 running, 15 sleeping, 0 stopped, 0 zombie > %Cpu(s): 1.3 us, 0.5 sy, 0.0 ni, 98.1 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 > st > MiB Mem : 128452.1 total, 43515.4 free, 87291.7 used, 2527.4 buff/cache > MiB Swap: 42973.0 total, 42968.4 free, 4.6 used. 41160.4 avail Mem > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 3236693 root 20 0 11.3g 100192 38508 S 0.0 0.1 0:00.41 kvm > 3236714 root 20 0 11.3g 100192 38508 S 0.0 0.1 0:00.00 kvm > 3236745 root 20 0 11.3g 100192 38508 S 0.0 0.1 0:00.00 CPU > 0/KVM > 3236746 root 20 0 11.3g 100192 38508 S 0.0 0.1 0:00.00 CPU > 1/KVM > 3236748 root 20 0 11.3g 100192 38508 S 0.0 0.1 0:00.00 > vnc_worker > 3236750 root 20 0 11.3g 100192 38508 S 0.0 0.1 0:01.45 > multifdrecv_4 > 3236751 root 20 0 11.3g 100192 38508 S 0.0 0.1 0:01.42 > multifdrecv_5 > 3236752 root 20 0 11.3g 100192 38508 S 0.0 0.1 0:01.35 > multifdrecv_6 > 3236753 root 20 0 11.3g 100192 38508 S 0.0 0.1 0:01.49 > multifdrecv_7 > 3236754 root 20 0 11.3g 100192 38508 S 0.0 0.1 0:01.45 > multifdrecv_1 > 3236755 root 20 0 11.3g 100192 38508 S 0.0 0.1 0:01.43 > multifdrecv_2 > 3236756 root 20 0 11.3g 100192 38508 S 0.0 0.1 0:01.45 > multifdrecv_3 > 3236757 root 20 0 11.3g 100192 38508 S 0.0 0.1 0:01.44 > multifdrecv_0 > > So we should still set the multifd channel socket to non-blocking ?
Have you looked at why the timeout didn't work? After all, QEMU is not the only application that uses recvmsg() like this, so I wonder whether it's intended or it's a kernel bug that recvmsg() didn't get kicked out. > > > With regards, > > Daniel > > -- > > |: https://berrange.com -o- > > https://www.flickr.com/photos/dberrange :| > > |: https://libvirt.org -o- > > https://fstop138.berrange.com :| > > |: https://entangle-photo.org -o- > > https://www.instagram.com/dberrange :| > > ------------------------------------------------------------------------------------------------------------------------------------- > 本邮件及其附件含有新华三集团的保密信息,仅限于发送给上面地址中列出 > 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、 > 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本 > 邮件! > This e-mail and its attachments contain confidential information from New > H3C, which is > intended only for the person or entity whose address is listed above. Any use > of the > information contained herein in any way (including, but not limited to, total > or partial > disclosure, reproduction, or dissemination) by persons other than the intended > recipient(s) is prohibited. If you receive this e-mail in error, please > notify the sender > by phone or email immediately and delete it! -- Peter Xu