On Wed, 2024-01-17 at 15:58 +0800, pet...@redhat.com wrote: > From: Peter Xu <pet...@redhat.com> > > When the migration frameworks fetches the exact pending sizes, it means > this check: > > remaining_size < s->threshold_size > > Must have been done already, actually at migration_iteration_run(): > > if (must_precopy <= s->threshold_size) { > qemu_savevm_state_pending_exact(&must_precopy, &can_postcopy); > > That should be after one round of ram_state_pending_estimate(). It makes > the 2nd check meaningless and can be dropped. > > To say it in another way, when reaching ->state_pending_exact(), we > unconditionally sync dirty bits for precopy. > > Then we can drop migrate_get_current() there too. > > Signed-off-by: Peter Xu <pet...@redhat.com>
Hi Peter, could you have a look at this issue: https://gitlab.com/qemu-project/qemu/-/issues/1565 which I reopened. Previous thread here: https://lore.kernel.org/qemu-devel/20230324184129.3119575-1-...@linux.ibm.com/ I'm seeing migration failures with s390x TCG again, which look the same to me as those a while back. > --- > migration/ram.c | 9 ++++----- > 1 file changed, 4 insertions(+), 5 deletions(-) > > diff --git a/migration/ram.c b/migration/ram.c > index c0cdcccb75..d5b7cd5ac2 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -3213,21 +3213,20 @@ static void ram_state_pending_estimate(void *opaque, > uint64_t *must_precopy, > static void ram_state_pending_exact(void *opaque, uint64_t *must_precopy, > uint64_t *can_postcopy) > { > - MigrationState *s = migrate_get_current(); > RAMState **temp = opaque; > RAMState *rs = *temp; > + uint64_t remaining_size; > > - uint64_t remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE; > - > - if (!migration_in_postcopy() && remaining_size < s->threshold_size) { > + if (!migration_in_postcopy()) { > bql_lock(); > WITH_RCU_READ_LOCK_GUARD() { > migration_bitmap_sync_precopy(rs, false); > } > bql_unlock(); > - remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE; > } > > + remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE; > + > if (migrate_postcopy_ram()) { > /* We can do postcopy, and all the data is postcopiable */ > *can_postcopy += remaining_size; This basically reverts 28ef5339c3 ("migration: fix ram_state_pending_exact()"), which originally made the issue disappear. Any thoughts on the matter appreciated. Thanks, Nina