On 25/01/2024 22:57, Fabiano Rosas wrote:
External email: Use caution opening links or attachments
Avihai Horon <avih...@nvidia.com> writes:
The commit in the fixes line moved multifd thread creation to a
different location, but forgot to move the p->running = true assignment
as well. Thus, p->running is set to true before multifd thread is
actually created.
p->running is used in multifd_save_cleanup() to decide whether to join
the multifd thread or not.
With TLS, an error in multifd_tls_channel_connect() can lead to a
segmentation fault because p->running is true but p->thread is never
initialized, so multifd_save_cleanup() tries to join an uninitialized
thread.
Fix it by moving p->running = true assignment right after multifd thread
creation. Also move qio_channel_set_delay() to there, as this is where
it used to be originally.
Fixes: 29647140157a ("migration/tls: add support for multifd tls-handshake")
Signed-off-by: Avihai Horon <avih...@nvidia.com>
Just for context, I haven't looked at this patch yet, but we were
planning to remove p->running altogether:
https://lore.kernel.org/r/20231110200241.20679-1-faro...@suse.de
Thanks for putting me in the picture.
I see that there has been a discussion about the multifd
creation/treadown flow.
In light of this discussion, I can already see a few problems in my
series that I didn't notice before (such as the TLS handshake thread leak).
The thread you mentioned here and some of my patches point out some
problems in multifd creation/treardown. I guess we can discuss it and
see what's the best way to solve them.
Regarding this patch, your solution indeed solves the bug that this
patch addresses, so maybe this could be dropped (or only noted in your
patch).
Maybe I should also put you (and Peter) in context for this whole series
-- I am writing it as preparation for adding a separate migration
channel for VFIO device migration, so VFIO devices could be migrated in
parallel.
So this series tries to lay down some foundations to facilitate it.