On 29.08.2024 02:51, Fabiano Rosas wrote:
"Maciej S. Szmigiero" <m...@maciej.szmigiero.name> writes:
On 28.08.2024 22:46, Fabiano Rosas wrote:
"Maciej S. Szmigiero" <m...@maciej.szmigiero.name> writes:
From: "Maciej S. Szmigiero" <maciej.szmigi...@oracle.com>
This is an updated v2 patch series of the v1 series located here:
https://lore.kernel.org/qemu-devel/cover.1718717584.git.maciej.szmigi...@oracle.com/
Changes from v1:
* Extended the QEMU thread-pool with non-AIO (generic) pool support,
implemented automatic memory management support for its work element
function argument.
* Introduced a multifd device state save thread pool, ported the VFIO
multifd device state save implementation to use this thread pool instead
of VFIO internally managed individual threads.
* Re-implemented on top of Fabiano's v4 multifd sender refactor patch set from
https://lore.kernel.org/qemu-devel/20240823173911.6712-1-faro...@suse.de/
* Moved device state related multifd code to new multifd-device-state.c
file where it made sense.
* Implemented a max in-flight VFIO device state buffer count limit to
allow capping the maximum recipient memory usage.
* Removed unnecessary explicit memory barriers from multifd_send().
* A few small changes like updated comments, code formatting,
fixed zero-copy RAM multifd bytes transferred counter under-counting, etc.
For convenience, this patch set is also available as a git tree:
https://github.com/maciejsszmigiero/qemu/tree/multifd-device-state-transfer-vfio
With this branch I'm getting:
(..)
$ ./tests/qemu-iotests/check -p -qcow2 068
...
+qemu-system-x86_64: ../util/qemu-thread-posix.c:92: qemu_mutex_lock_impl:
Assertion `mutex->initialized' failed.
I'm not sure how this can happen - it looks like qemu_loadvm_state() might be
called
somehow after migration_incoming_state_destroy() already destroyed the
migration state?
Will investigate this in detail tomorrow.
Usually something breaks and then the clean up code rushes and frees
state while other parts are still using it.
We also had issues recently with code not incrementing the migration
state refcount properly:
27eb8499ed ("migration: Fix use-after-free of migration state object")
Looks like MigrationIncomingState is just for "true" incoming migration,
which can be started just once - so it is destroyed after the first
attempt and never reinitialized.
On the other hand, MigrationState is for both true incoming migration and
also for snapshot load - the later which seems able to be started multiple
times.
Moved these variables to MigrationState, updated the GitHub tree and now
this test passes.
Thanks,
Maciej