On Wed, 2024-01-17 at 15:58 +0800, pet...@redhat.com wrote:
> From: Peter Xu <pet...@redhat.com>
> 
> When the migration frameworks fetches the exact pending sizes, it means
> this check:
> 
>   remaining_size < s->threshold_size
> 
> Must have been done already, actually at migration_iteration_run():
> 
>     if (must_precopy <= s->threshold_size) {
>         qemu_savevm_state_pending_exact(&must_precopy, &can_postcopy);
> 
> That should be after one round of ram_state_pending_estimate().  It makes
> the 2nd check meaningless and can be dropped.
> 
> To say it in another way, when reaching ->state_pending_exact(), we
> unconditionally sync dirty bits for precopy.
> 
> Then we can drop migrate_get_current() there too.
> 
> Signed-off-by: Peter Xu <pet...@redhat.com>

Hi Peter,

could you have a look at this issue:
https://gitlab.com/qemu-project/qemu/-/issues/1565

which I reopened. Previous thread here:

https://lore.kernel.org/qemu-devel/20230324184129.3119575-1-...@linux.ibm.com/

I'm seeing migration failures with s390x TCG again, which look the same to me
as those a while back.

> ---
>  migration/ram.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index c0cdcccb75..d5b7cd5ac2 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -3213,21 +3213,20 @@ static void ram_state_pending_estimate(void *opaque, 
> uint64_t *must_precopy,
>  static void ram_state_pending_exact(void *opaque, uint64_t *must_precopy,
>                                      uint64_t *can_postcopy)
>  {
> -    MigrationState *s = migrate_get_current();
>      RAMState **temp = opaque;
>      RAMState *rs = *temp;
> +    uint64_t remaining_size;
>  
> -    uint64_t remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
> -
> -    if (!migration_in_postcopy() && remaining_size < s->threshold_size) {
> +    if (!migration_in_postcopy()) {
>          bql_lock();
>          WITH_RCU_READ_LOCK_GUARD() {
>              migration_bitmap_sync_precopy(rs, false);
>          }
>          bql_unlock();
> -        remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
>      }
>  
> +    remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
> +
>      if (migrate_postcopy_ram()) {
>          /* We can do postcopy, and all the data is postcopiable */
>          *can_postcopy += remaining_size;

This basically reverts 28ef5339c3 ("migration: fix ram_state_pending_exact()"), 
which originally
made the issue disappear.

Any thoughts on the matter appreciated.

Thanks,
Nina

Reply via email to