> -----邮件原件----- > 发件人: Peter Xu <pet...@redhat.com> > 发送时间: 2024年9月20日 23:53 > 收件人: yuchen (CCSPL) <yu.c...@h3c.com> > 抄送: faro...@suse.de; qemu-devel@nongnu.org > 主题: Re: [PATCH] migration/multifd: receive channel socket needs to be set to > non-blocking > > On Fri, Sep 20, 2024 at 10:05:42AM +0000, Yuchen wrote: > > When the migration network is disconnected, the source qemu can exit > > normally with an error, but the destination qemu is always blocked in > > recvmsg(), causes the destination qemu main thread to be blocked. > > > > The destination qemu block stack: > > Thread 13 (Thread 0x7f0178bfa640 (LWP 1895906) "multifdrecv_6"): > > #0 0x00007f041b5af56f in recvmsg () > > #1 0x000055573ebd0b42 in qio_channel_socket_readv > > #2 0x000055573ebce83f in qio_channel_readv > > #3 qio_channel_readv_all_eof > > #4 0x000055573ebce909 in qio_channel_readv_all > > #5 0x000055573eaa1b1f in multifd_recv_thread > > #6 0x000055573ec2f0b9 in qemu_thread_start > > #7 0x00007f041b52bf7a in start_thread > > #8 0x00007f041b5ae600 in clone3 > > > > Thread 1 (Thread 0x7f0410c62240 (LWP 1895156) "kvm"): > > #0 0x00007f041b528ae2 in __futex_abstimed_wait_common () > > #1 0x00007f041b5338b8 in __new_sem_wait_slow64.constprop.0 > > #2 0x000055573ec2fd34 in qemu_sem_wait (sem=0x555742b5a4e0) > > #3 0x000055573eaa2f09 in multifd_recv_sync_main () > > #4 0x000055573e7d590d in ram_load_precopy > (f=f@entry=0x555742291c20) > > #5 0x000055573e7d5cbf in ram_load (opaque=<optimized out>, > > version_id=<optimized out>, f=0x555742291c20) > > #6 ram_load_entry (f=0x555742291c20, opaque=<optimized out>, > > version_id=<optimized out>) > > #7 0x000055573ea932e7 in qemu_loadvm_section_part_end > > (mis=0x555741136c00, f=0x555742291c20) > > #8 qemu_loadvm_state_main (f=f@entry=0x555742291c20, > > mis=mis@entry=0x555741136c00) > > #9 0x000055573ea94418 in qemu_loadvm_state (f=0x555742291c20, > > mode=mode@entry=VMS_MIGRATE) > > #10 0x000055573ea88be1 in process_incoming_migration_co > > (opaque=<optimized out>) > > #11 0x000055573ec43d13 in coroutine_trampoline (i0=<optimized out>, > > i1=<optimized out>) > > #12 0x00007f041b4f5d90 in ?? () from target:/usr/lib64/libc.so.6 > > #13 0x00007ffc11890270 in ?? () > > #14 0x0000000000000000 in ?? () > > > > Setting the receive channel to non-blocking can solve the problem. > > Multifd threads are real threads and there's no coroutine, I'm slightly > confused > why it needs to use nonblock. > > Why recvmsg() didn't get kicked out when disconnect? Is it a generic Linux > kernel are you using? > My steps to reproduce: ifdown migration network, or disable migration network using iptables. The probability of recurrence of these two methods is very high.
My test environment uses is linux-5.10.136. multifd thread block in kernel: # cat /proc/3416190/stack [<0>] wait_woken+0x43/0x80 [<0>] sk_wait_data+0x123/0x140 [<0>] tcp_recvmsg+0x4f8/0xa50 [<0>] inet6_recvmsg+0x5e/0x120 [<0>] ____sys_recvmsg+0x87/0x180 [<0>] ___sys_recvmsg+0x82/0x110 [<0>] __sys_recvmsg+0x56/0xa0 [<0>] do_syscall_64+0x3d/0x80 [<0>] entry_SYSCALL_64_after_hwframe+0x61/0xc6 > I wonder whether that's the expected behavior for sockets. E.g., we do have > multifd/cancel test (test_multifd_tcp_cancel) and I think that runs this path > too > with it always in block mode as of now.. > My previous statement may not be accurate. The migration network socket is not disconnected. I use ifdown or iptables to simulate the network card failure. Because the TCP connection was not disconnected, so recvmsg() was blocked. Ordinary precopy migration, the destination also uses non-blocking, I think it's to avoid non-blocking. Qemu master lastest code: /** * migration_incoming_setup: Setup incoming migration * @f: file for main migration channel */ static void migration_incoming_setup(QEMUFile *f) { MigrationIncomingState *mis = migration_incoming_get_current(); if (!mis->from_src_file) { mis->from_src_file = f; } qemu_file_set_blocking(f, false); } > > > > Signed-off-by: YuChen <yu.c...@h3c.com> > > --- > > migration/multifd.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/migration/multifd.c b/migration/multifd.c index > > 9b200f4ad9..7b2a768f05 100644 > > --- a/migration/multifd.c > > +++ b/migration/multifd.c > > @@ -1318,6 +1318,8 @@ void multifd_recv_new_channel(QIOChannel *ioc, > Error **errp) > > id = qatomic_read(&multifd_recv_state->count); > > } > > > > + qio_channel_set_blocking(ioc, false, NULL); > > + > > p = &multifd_recv_state->params[id]; > > if (p->c != NULL) { > > error_setg(&local_err, "multifd: received id '%d' already > > setup'", > > -- > > 2.30.2 > > ---------------------------------------------------------------------- > > --------------------------------------------------------------- > > 本邮件及其附件含有新华三集团的保密信息,仅限于发送给上面地址中 > 列出 > > 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或 > 部分地泄露、复制、 > > 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件 > 通知发件人并删除本 > > 邮件! > > This e-mail and its attachments contain confidential information from > > New H3C, which is intended only for the person or entity whose address > > is listed above. Any use of the information contained herein in any > > way (including, but not limited to, total or partial disclosure, > > reproduction, or dissemination) by persons other than the intended > > recipient(s) is prohibited. If you receive this e-mail in error, > > please notify the sender by phone or email immediately and delete it! > > -- > Peter Xu