Peter Xu <pet...@redhat.com> writes:

> This is a follow up of below patch from Avihai as a replacement:
>
> https://lore.kernel.org/qemu-devel/20241020130108.27148-3-avih...@nvidia.com/
>
> This is v2 of the series, and it became a more generic rework on how we do
> migration object refcounts, so I skipped a changelog because most of this
> is new things.
>
> To put it simple, now I introduced another pointer to migration object, and
> here's a simple explanation for both after all change applied (copy-paste
> from one of the patch):
>
> /*
>  * We have two pointers for the global migration objects.  Both of them are
>  * initialized early during QEMU starts, but they have different lifecycles.

The very next person that needs to access one of those in migration code
will get this wrong.

>  *
>  * - current_migration
>  *
>  *   This variable reflects the whole lifecycle of the migration object
>  *   (which each QEMU can only have one).  It is valid until the migration
>  *   object is destroyed.
>  *
>  *   This is the object that internal migration so far use.  For example,
>  *   internal helper migrate_get_current() references it.
>  *
>  *   When all migration code can always pass over a MigrationState* around,
>  *   this variable can logically be dropped.  But we're not yet there.

Won't dropping it just bring us back to the situation before this patch?
I'd say we need the opposite, to stop using migrate_get_current()
everywhere in the migration code and instead expose the
current_migration via an internal header.

>  *
>  * - global_migration

Both are global, we can't prefix one of them with global_.

>  *
>  *   This is valid only until the migration object is still valid to the
>  *   outside-migration world (until migration_shutdown()).
>  *
>  *   This should normally be always set, cleared or accessed by the main
>  *   thread only, rather than the migration thread.
>  *
>  *   All the exported functions (in include/migration) should reference the
>  *   exported migration object only to avoid race conditions, as
>  *   current_migration can be freed concurrently by migration thread when
>  *   the migration thread holds the last refcount.
>  */

Have you thought of locking the migration object instead?

Having two global pointers to the same object with one having slightly
different lifecycle than the other will certainly lead to misuse.

I worry about mixing the usage of both globals due to some migration
code that needs to call these exported functions or external code
reaching into migration/ through some indirect path. Check global and
try to use current sort of scenarios (and vice-versa).

Similarly, what about when a lingering reference still exists, but
global_migration is already clear? Any migration code looking at
global_migration would do the wrong thing.

>
> It allows all misc.h exported helpers to be used for the whole VM
> lifecycle, so as to never crash QEMU with freed migration objects.

Isn't there a race with the last reference being dropped at
migration_shutdown() and global_migration still being set?

>
> Thanks,
>
> Peter Xu (4):
>   migration: Unexport dirty_bitmap_mig_init() in misc.h
>   migration: Reset current_migration properly
>   migration: Add global_migration
>   migration: Make all helpers in misc.h safe to use without migration
>
>  include/migration/misc.h | 29 ++++++++----
>  migration/migration.h    |  4 ++
>  migration/migration.c    | 99 +++++++++++++++++++++++++++++++++++-----
>  3 files changed, 113 insertions(+), 19 deletions(-)

Reply via email to