Re: 回复: 回复: [PATCH] migration/multifd: receive channel socket needs to be set to non-blocking

Peter Xu Wed, 25 Sep 2024 08:33:41 -0700

On Tue, Sep 24, 2024 at 08:25:22AM +0000, Yuchen wrote:
> 
> 
> > -----邮件原件-----
> > 发件人: Daniel P. Berrangé <berra...@redhat.com>
> > 发送时间: 2024年9月24日 0:59
> > 收件人: yuchen (CCSPL) <yu.c...@h3c.com>
> > 抄送: Peter Xu <pet...@redhat.com>; faro...@suse.de;
> > qemu-devel@nongnu.org
> > 主题: Re: 回复: [PATCH] migration/multifd: receive channel socket needs to
> > be set to non-blocking
> >
> > On Mon, Sep 23, 2024 at 01:33:13AM +0000, Yuchen wrote:
> > >
> > >
> > > > -----邮件原件-----
> > > > 发件人: Peter Xu <pet...@redhat.com>
> > > > 发送时间: 2024年9月20日 23:53
> > > > 收件人: yuchen (CCSPL) <yu.c...@h3c.com>
> > > > 抄送: faro...@suse.de; qemu-devel@nongnu.org
> > > > 主题: Re: [PATCH] migration/multifd: receive channel socket needs to
> > > > be set to non-blocking
> > > >
> > > > On Fri, Sep 20, 2024 at 10:05:42AM +0000, Yuchen wrote:
> > > > > When the migration network is disconnected, the source qemu can
> > > > > exit normally with an error, but the destination qemu is always
> > > > > blocked in recvmsg(), causes the destination qemu main thread to be
> > blocked.
> > > > >
> > > > > The destination qemu block stack:
> > > > > Thread 13 (Thread 0x7f0178bfa640 (LWP 1895906) "multifdrecv_6"):
> > > > > #0  0x00007f041b5af56f in recvmsg ()
> > > > > #1  0x000055573ebd0b42 in qio_channel_socket_readv
> > > > > #2  0x000055573ebce83f in qio_channel_readv
> > > > > #3  qio_channel_readv_all_eof
> > > > > #4  0x000055573ebce909 in qio_channel_readv_all
> > > > > #5  0x000055573eaa1b1f in multifd_recv_thread
> > > > > #6  0x000055573ec2f0b9 in qemu_thread_start
> > > > > #7  0x00007f041b52bf7a in start_thread
> > > > > #8  0x00007f041b5ae600 in clone3
> > > > >
> > > > > Thread 1 (Thread 0x7f0410c62240 (LWP 1895156) "kvm"):
> > > > > #0  0x00007f041b528ae2 in __futex_abstimed_wait_common ()
> > > > > #1  0x00007f041b5338b8 in __new_sem_wait_slow64.constprop.0
> > > > > #2  0x000055573ec2fd34 in qemu_sem_wait (sem=0x555742b5a4e0)
> > > > > #3  0x000055573eaa2f09 in multifd_recv_sync_main ()
> > > > > #4  0x000055573e7d590d in ram_load_precopy
> > > > (f=f@entry=0x555742291c20)
> > > > > #5  0x000055573e7d5cbf in ram_load (opaque=<optimized out>,
> > > > > version_id=<optimized out>, f=0x555742291c20)
> > > > > #6  ram_load_entry (f=0x555742291c20, opaque=<optimized out>,
> > > > > version_id=<optimized out>)
> > > > > #7  0x000055573ea932e7 in qemu_loadvm_section_part_end
> > > > > (mis=0x555741136c00, f=0x555742291c20)
> > > > > #8  qemu_loadvm_state_main (f=f@entry=0x555742291c20,
> > > > > mis=mis@entry=0x555741136c00)
> > > > > #9  0x000055573ea94418 in qemu_loadvm_state (f=0x555742291c20,
> > > > > mode=mode@entry=VMS_MIGRATE)
> > > > > #10 0x000055573ea88be1 in process_incoming_migration_co
> > > > > (opaque=<optimized out>)
> > > > > #11 0x000055573ec43d13 in coroutine_trampoline (i0=<optimized
> > > > > out>, i1=<optimized out>)
> > > > > #12 0x00007f041b4f5d90 in ?? () from target:/usr/lib64/libc.so.6
> > > > > #13 0x00007ffc11890270 in ?? ()
> > > > > #14 0x0000000000000000 in ?? ()
> > > > >
> > > > > Setting the receive channel to non-blocking can solve the problem.
> > > >
> > > > Multifd threads are real threads and there's no coroutine, I'm
> > > > slightly confused why it needs to use nonblock.
> > > >
> > > > Why recvmsg() didn't get kicked out when disconnect?  Is it a
> > > > generic Linux kernel are you using?
> > > >
> > > My steps to reproduce:
> > > ifdown migration network, or disable migration network using iptables.
> > > The probability of recurrence of these two methods is very high.
> > >
> > > My test environment uses is linux-5.10.136.
> > >
> > > multifd thread block in kernel:
> > > # cat /proc/3416190/stack
> > > [<0>] wait_woken+0x43/0x80
> > > [<0>] sk_wait_data+0x123/0x140
> > > [<0>] tcp_recvmsg+0x4f8/0xa50
> > > [<0>] inet6_recvmsg+0x5e/0x120
> > > [<0>] ____sys_recvmsg+0x87/0x180
> > > [<0>] ___sys_recvmsg+0x82/0x110
> > > [<0>] __sys_recvmsg+0x56/0xa0
> > > [<0>] do_syscall_64+0x3d/0x80
> > > [<0>] entry_SYSCALL_64_after_hwframe+0x61/0xc6
> > >
> > > > I wonder whether that's the expected behavior for sockets.  E.g., we
> > > > do have multifd/cancel test (test_multifd_tcp_cancel) and I think
> > > > that runs this path too with it always in block mode as of now..
> > > >
> > > My previous statement may not be accurate. The migration network socket is
> > not disconnected.
> > > I use ifdown or iptables to simulate the network card failure.
> > > Because the TCP connection was not disconnected, so recvmsg() was
> > blocked.
> >
> > How long did you wait after doing ifdown ?   TCP is intended to wait if
> 
> I waited about 15 minutes, the source qemu migration threads quit, but
> the destination qemu migration threads is still there.
> 
> > there is an interruption.... only eventually after relevant TCP timeouts 
> > are hit, it
> > will terminate the connection.  QEMU shouldn't proactively give up if the 
> > TCP
> > conn is still in an active state as reported by the kernel, even if traffic 
> > isn't
> > currently flowing.
> >
> >
> 
> Daniel, I agree with what you said, But in fact, the destination migration 
> connection is not disconnected
> and is in the close wait state.
> 
> The source qemu process lsof and top:
> # lsof -p 384509
> ...
> kvm     384509 root  112u     sock                0,8         0t0 157321811 
> protocol: TCP
> kvm     384509 root  113u     sock                0,8         0t0 157321813 
> protocol: TCP
> kvm     384509 root  114u     sock                0,8         0t0 157321815 
> protocol: TCP
> kvm     384509 root  115u     sock                0,8         0t0 157321817 
> protocol: TCP
> kvm     384509 root  116u     sock                0,8         0t0 157321819 
> protocol: TCP
> kvm     384509 root  117u     sock                0,8         0t0 157321821 
> protocol: TCP
> kvm     384509 root  118u     sock                0,8         0t0 157321823 
> protocol: TCP
> kvm     384509 root  119u     sock                0,8         0t0 157321825 
> protocol: TCP
> 
> # top -H -p 384509
> top - 15:10:22 up 5 days, 18:54,  3 users,  load average: 5.16, 4.61, 4.50
> Threads:   8 total,   3 running,   5 sleeping,   0 stopped,   0 zombie
> %Cpu(s):  5.2 us,  5.2 sy,  0.0 ni, 89.3 id,  0.0 wa,  0.1 hi,  0.1 si,  0.0 
> st
> MiB Mem : 128298.7 total,  41490.2 free,  89470.2 used,   2168.0 buff/cache
> MiB Swap:  42922.0 total,  42910.4 free,     11.6 used.  38828.5 avail Mem
>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
>  384596 root      20   0   11.9g  93516  40112 R  98.7   0.1 261:13.24 CPU 
> 1/KVM
>  384595 root      20   0   11.9g  93516  40112 R  98.0   0.1  56:44.31 CPU 
> 0/KVM
>  384509 root      20   0   11.9g  93516  40112 R   1.3   0.1   7:38.73 kvm
>  384563 root      20   0   11.9g  93516  40112 S   0.0   0.1   0:00.05 kvm
>  384598 root      20   0   11.9g  93516  40112 S   0.0   0.1   0:01.00 
> vnc_worker
> 1544593 root      20   0   11.9g  93516  40112 S   0.0   0.1   0:00.00 worker
> 
> The destination qemu process lsof and top:
> # lsof -p 3236693
> kvm     3236693 root   29u     IPv6          159227758         0t0       TCP 
> node18:49156->2.2.2.6:41880 (CLOSE_WAIT)
> kvm     3236693 root   30u     IPv6          159227759         0t0       TCP 
> node18:49156->2.2.2.6:41890 (ESTABLISHED)
> kvm     3236693 root   31u     IPv6          159227760         0t0       TCP 
> node18:49156->2.2.2.6:41902 (ESTABLISHED)
> kvm     3236693 root   32u     IPv6          159227762         0t0       TCP 
> node18:49156->2.2.2.6:41912 (ESTABLISHED)
> kvm     3236693 root   33u     IPv6          159227761         0t0       TCP 
> node18:49156->2.2.2.6:41904 (ESTABLISHED)
> kvm     3236693 root   34u     IPv6          159227763         0t0       TCP 
> node18:49156->2.2.2.6:41918 (ESTABLISHED)
> kvm     3236693 root   35u     IPv6          159227764         0t0       TCP 
> node18:49156->2.2.2.6:41924 (ESTABLISHED)
> kvm     3236693 root   36u     IPv6          159227765         0t0       TCP 
> node18:49156->2.2.2.6:41934 (ESTABLISHED)
> kvm     3236693 root   37u     IPv6          159227766         0t0       TCP 
> node18:49156->2.2.2.6:41942 (ESTABLISHED)
> 
> # top -H -p 3236693
> top - 15:09:25 up 5 days, 19:12,  2 users,  load average: 0.63, 0.68, 0.89
> Threads:  15 total,   0 running,  15 sleeping,   0 stopped,   0 zombie
> %Cpu(s):  1.3 us,  0.5 sy,  0.0 ni, 98.1 id,  0.0 wa,  0.0 hi,  0.1 si,  0.0 
> st
> MiB Mem : 128452.1 total,  43515.4 free,  87291.7 used,   2527.4 buff/cache
> MiB Swap:  42973.0 total,  42968.4 free,      4.6 used.  41160.4 avail Mem
>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
> 3236693 root      20   0   11.3g 100192  38508 S   0.0   0.1   0:00.41 kvm
> 3236714 root      20   0   11.3g 100192  38508 S   0.0   0.1   0:00.00 kvm
> 3236745 root      20   0   11.3g 100192  38508 S   0.0   0.1   0:00.00 CPU 
> 0/KVM
> 3236746 root      20   0   11.3g 100192  38508 S   0.0   0.1   0:00.00 CPU 
> 1/KVM
> 3236748 root      20   0   11.3g 100192  38508 S   0.0   0.1   0:00.00 
> vnc_worker
> 3236750 root      20   0   11.3g 100192  38508 S   0.0   0.1   0:01.45 
> multifdrecv_4
> 3236751 root      20   0   11.3g 100192  38508 S   0.0   0.1   0:01.42 
> multifdrecv_5
> 3236752 root      20   0   11.3g 100192  38508 S   0.0   0.1   0:01.35 
> multifdrecv_6
> 3236753 root      20   0   11.3g 100192  38508 S   0.0   0.1   0:01.49 
> multifdrecv_7
> 3236754 root      20   0   11.3g 100192  38508 S   0.0   0.1   0:01.45 
> multifdrecv_1
> 3236755 root      20   0   11.3g 100192  38508 S   0.0   0.1   0:01.43 
> multifdrecv_2
> 3236756 root      20   0   11.3g 100192  38508 S   0.0   0.1   0:01.45 
> multifdrecv_3
> 3236757 root      20   0   11.3g 100192  38508 S   0.0   0.1   0:01.44 
> multifdrecv_0
> 
> So we should still set the multifd channel socket to non-blocking ?


Have you looked at why the timeout didn't work?

After all, QEMU is not the only application that uses recvmsg() like this,
so I wonder whether it's intended or it's a kernel bug that recvmsg()
didn't get kicked out.

> 
> > With regards,
> > Daniel
> > --
> > |: https://berrange.com      -o-
> > https://www.flickr.com/photos/dberrange :|
> > |: https://libvirt.org         -o-
> > https://fstop138.berrange.com :|
> > |: https://entangle-photo.org    -o-
> > https://www.instagram.com/dberrange :|
> 
> -------------------------------------------------------------------------------------------------------------------------------------
> 本邮件及其附件含有新华三集团的保密信息，仅限于发送给上面地址中列出
> 的个人或群组。禁止任何其他人以任何形式使用（包括但不限于全部或部分地泄露、复制、
> 或散发）本邮件中的信息。如果您错收了本邮件，请您立即电话或邮件通知发件人并删除本
> 邮件！
> This e-mail and its attachments contain confidential information from New 
> H3C, which is
> intended only for the person or entity whose address is listed above. Any use 
> of the
> information contained herein in any way (including, but not limited to, total 
> or partial
> disclosure, reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please 
> notify the sender
> by phone or email immediately and delete it!

-- 
Peter Xu

Re: 回复: 回复: [PATCH] migration/multifd: receive channel socket needs to be set to non-blocking

Reply via email to