send_prepare() and cleanups

Fabiano Rosas Thu, 01 Feb 2024 13:47:25 -0800

Avihai Horon <avih...@nvidia.com> writes:

> On 01/02/2024 7:47, Peter Xu wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On Wed, Jan 31, 2024 at 07:49:51PM -0300, Fabiano Rosas wrote:
>>> pet...@redhat.com writes:
>>>
>>>> From: Peter Xu <pet...@redhat.com>
>>>>
>>>> This patchset contains quite a few refactorings to current multifd:
>>>>
>>>>    - It picked up some patches from an old series of mine [0] (the last
>>>>      patches were dropped, though; I did the cleanup slightly differently):
>>>>
>>>>      I still managed to include one patch to split pending_job, but I
>>>>      rewrote the patch here.
>>>>
>>>>    - It tries to cleanup multiple multifd paths here and there, the 
>>>> ultimate
>>>>      goal is to redefine send_prepare() to be something like:
>>>>
>>>>        p->pages ----------->  send_prepare() -------------> IOVs
>>>>
>>>>      So that there's no obvious change yet on multifd_ops besides redefined
>>>>      interface for send_prepare().  We may want a separate OPs for file
>>>>      later.
>>>>
>>>> For 2), one benefit is already presented by Fabiano in his other series [1]
>>>> on cleaning up zero copy, but this patchset addressed it quite differently,
>>>> and hopefully also more gradually.  The other benefit is for sure if we
>>>> have a more concrete API for send_prepare() and if we can reach an initial
>>>> consensus, then we can have the recent compression accelerators rebased on
>>>> top of this one.
>>>>
>>>> This also prepares for the case where the input can be extended to even not
>>>> any p->pages, but arbitrary data (like VFIO's potential use case in the
>>>> future?).  But that will also for later even if reasonable.
>>>>
>>>> Please have a look.  Thanks,
>>>>
>>>> [0] https://lore.kernel.org/r/20231022201211.452861-1-pet...@redhat.com
>>>> [1] 
>>>> https://lore.kernel.org/qemu-devel/20240126221943.26628-1-faro...@suse.de
>>>>
>>>> Peter Xu (14):
>>>>    migration/multifd: Drop stale comment for multifd zero copy
>>>>    migration/multifd: multifd_send_kick_main()
>>>>    migration/multifd: Drop MultiFDSendParams.quit, cleanup error paths
>>>>    migration/multifd: Postpone reset of MultiFDPages_t
>>>>    migration/multifd: Drop MultiFDSendParams.normal[] array
>>>>    migration/multifd: Separate SYNC request with normal jobs
>>>>    migration/multifd: Simplify locking in sender thread
>>>>    migration/multifd: Drop pages->num check in sender thread
>>>>    migration/multifd: Rename p->num_packets and clean it up
>>>>    migration/multifd: Move total_normal_pages accounting
>>>>    migration/multifd: Move trace_multifd_send|recv()
>>>>    migration/multifd: multifd_send_prepare_header()
>>>>    migration/multifd: Move header prepare/fill into send_prepare()
>>>>    migration/multifd: Forbid spurious wakeups
>>>>
>>>>   migration/multifd.h      |  34 +++--
>>>>   migration/multifd-zlib.c |  11 +-
>>>>   migration/multifd-zstd.c |  11 +-
>>>>   migration/multifd.c      | 291 +++++++++++++++++++--------------------
>>>>   4 files changed, 182 insertions(+), 165 deletions(-)
>>> This series didn't survive my 9999 iterations test on the opensuse
>>> machine.
>>>
>>> # Running /x86_64/migration/multifd/tcp/tls/x509/reject-anon-client
>>> ...
>>> kill_qemu() detected QEMU death from signal 11 (Segmentation fault) (core 
>>> dumped)
>>>
>>>
>>> #0  0x00005575dda06399 in qemu_mutex_lock_impl (mutex=0x18, 
>>> file=0x5575ddce9cc3 "../util/qemu-thread-posix.c", line=275) at 
>>> ../util/qemu-thread-posix.c:92
>>> #1  0x00005575dda06a94 in qemu_sem_post (sem=0x18) at 
>>> ../util/qemu-thread-posix.c:275
>>> #2  0x00005575dd56a512 in multifd_send_thread (opaque=0x5575df054ef8) at 
>>> ../migration/multifd.c:720
>>> #3  0x00005575dda0709b in qemu_thread_start (args=0x7fd404001d50) at 
>>> ../util/qemu-thread-posix.c:541
>>> #4  0x00007fd45e8a26ea in start_thread (arg=0x7fd3faffd700) at 
>>> pthread_create.c:477
>>> #5  0x00007fd45cd2150f in clone () at 
>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
>>>
>>> The multifd thread is posting channels_ready with an already freed
>>> multifd_send_state.
>>>
>>> This is the bug Avihai has hit. We're going into multifd_save_cleanup()
>>> so early that multifd_new_send_channel_async() hasn't even had the
>>> chance to set p->running. So it misses the join and frees everything up
>>> while a second multifd thread is just starting.
>> Thanks for doing that.
>>
>> Would this series makes that bug easier to happen?
>
> I think so.
> Patch #3 added an extra multifd_send_should_exit() check in 
> multifd_send_sync_main(), so now it can exit early if the first channel 
> fails.
> Plus, now migration state is set to FAILED early by:
> multifd_new_send_channel_async()->multifd_send_terminate_threads() and 
> multifd_tls_outgoing_handshake()->multifd_send_terminate_threads()
> so migration_iteration_run() is completely skipped because 
> migration_is_active() check before it will return false.
>
> I *think* this is what makes main migration thread finish earlier and 
> call multifd_save_cleanup() earlier, at least for me.
>


I'm doing some experiments with a global semaphore like channels_ready
instead of a per-channel structure like you suggested. I think we only
need to have a point past which we're assured no more channels will be
created. With that we'd only need one post at
multifd_new_send_channel_async.

Re: [PATCH 00/14] migration/multifd: Refactor ->send_prepare() and cleanups

Reply via email to