On Wed, Oct 23, 2024 at 04:32:01PM -0300, Fabiano Rosas wrote: > Peter Xu <pet...@redhat.com> writes: > > > This is a follow up of below patch from Avihai as a replacement: > > > > https://lore.kernel.org/qemu-devel/20241020130108.27148-3-avih...@nvidia.com/ > > > > This is v2 of the series, and it became a more generic rework on how we do > > migration object refcounts, so I skipped a changelog because most of this > > is new things. > > > > To put it simple, now I introduced another pointer to migration object, and > > here's a simple explanation for both after all change applied (copy-paste > > from one of the patch): > > > > /* > > * We have two pointers for the global migration objects. Both of them are > > * initialized early during QEMU starts, but they have different lifecycles. > > The very next person that needs to access one of those in migration code > will get this wrong.
Migration code should never access global_migration except during init / shutdown (or a renamed version of it), that's the plan.. so no one should use it within migration/. > > > * > > * - current_migration > > * > > * This variable reflects the whole lifecycle of the migration object > > * (which each QEMU can only have one). It is valid until the migration > > * object is destroyed. > > * > > * This is the object that internal migration so far use. For example, > > * internal helper migrate_get_current() references it. > > * > > * When all migration code can always pass over a MigrationState* around, > > * this variable can logically be dropped. But we're not yet there. > > Won't dropping it just bring us back to the situation before this patch? > I'd say we need the opposite, to stop using migrate_get_current() > everywhere in the migration code and instead expose the > current_migration via an internal header. I meant dropping all the global access to current_migration within migration/ dir. Consider all functions has the 1st parameter as MigrationState*, for example. Then we don't need that for internal use, while global_migration is still needed for external use, but only for external use. > > > * > > * - global_migration > > Both are global, we can't prefix one of them with global_. I can rename it to migration_export, but the question is whether the name is the issue, or you'd think having two global variables pointing to migration object is the issue / concern. So I think fundaementally we indeed only need one global var there, if we can cleanup the migration/ everywhere to always take the pointer from the caller, then migration thread will take 1 refcount and that guarantees it's available for migration/. > > > * > > * This is valid only until the migration object is still valid to the > > * outside-migration world (until migration_shutdown()). > > * > > * This should normally be always set, cleared or accessed by the main > > * thread only, rather than the migration thread. > > * > > * All the exported functions (in include/migration) should reference the > > * exported migration object only to avoid race conditions, as > > * current_migration can be freed concurrently by migration thread when > > * the migration thread holds the last refcount. > > */ > > Have you thought of locking the migration object instead? Yes, logically we could use RCU if we want, then take BQL for example only if update. but I worry that's an overkill: we'll need too many rcu read lock all over the places.. > > Having two global pointers to the same object with one having slightly > different lifecycle than the other will certainly lead to misuse. That's indeed tricky, it's just that this is so far the easiest change I can think of. > > I worry about mixing the usage of both globals due to some migration > code that needs to call these exported functions or external code > reaching into migration/ through some indirect path. Check global and > try to use current sort of scenarios (and vice-versa). Yeh I get your concern. I actually tried to observe such usages (for now, especially when migration/ uses the misc.h exported functions) and they're all safe. I should have mentioned that. For the other way round, I don't yet expect that to happen: the plan is anything outside must call a function in include/migration/* and that must only use global_migration. > > Similarly, what about when a lingering reference still exists, but > global_migration is already clear? Any migration code looking at > global_migration would do the wrong thing. That's how it works: migration thread will take one refcount and that'll happen when migration is running but when VM shutdowns itself. I checked that all the migration code should be fine when using the exported functions even if they should reference current_migration. So logically if with such design, indeed internal migration/ code shouldn't reference global_migration at all just to be always safe. Any idea in your mind that can make this easier? I'm definitely open to suggestions. > > > > > It allows all misc.h exported helpers to be used for the whole VM > > lifecycle, so as to never crash QEMU with freed migration objects. > > Isn't there a race with the last reference being dropped at > migration_shutdown() and global_migration still being set? It needs to be protected by BQL in this case, so anyone using exported functions need to take BQL first. > > > > > Thanks, > > > > Peter Xu (4): > > migration: Unexport dirty_bitmap_mig_init() in misc.h > > migration: Reset current_migration properly > > migration: Add global_migration > > migration: Make all helpers in misc.h safe to use without migration > > > > include/migration/misc.h | 29 ++++++++---- > > migration/migration.h | 4 ++ > > migration/migration.c | 99 +++++++++++++++++++++++++++++++++++----- > > 3 files changed, 113 insertions(+), 19 deletions(-) > -- Peter Xu