On 31.12.21 13:01, Pankaj Gupta wrote:
> From: Pankaj Gupta <pankaj.gupta.li...@gmail.com>>
> 
> Enable live migration support for virtio-pmem device.
> Tested this: with live migration on same host.
> 
> Need suggestion on below points to support virtio-pmem live migration
> between two separate host systems:

I assume emulated NVDIMMs would have the exact same issue, right?

There are two cases to consider I think:

1) Backing storage is migrated manually to the destination (i.e., a file
that is copied/moved/transmitted during migration)

2) Backing storage is located on a shared network storage (i.e., a file
that is not copied during migration)

IIRC you're concerned about 2).

> 
> - There is still possibility of stale page cache page at the
>   destination host which we cannot invalidate currently as done in 1]
>   for write-back mode because virtio-pmem memory backend file is mmaped
>   in guest address space and invalidating corresponding page cache pages
>   would also fault all the other userspace process mappings on the same file.
>   Or we make it strict no other process would mmap this backing file?

I'd have assume that a simple fsync on the src once migration is about
to switch over (e.g., pre_save/post_save handler) should be enough to
trigger writeback to the backing storage, at which point the dst can
take over. So handling the src is easy.

So is the issue that the dst might still have stale pagecache
information, because it already accessed some of that file previously,
correct?

> 
>   -- In commit 1] we first fsync and then invalidate all the pages from 
> destination
>      page cache. fsync would sync the stale dirty page cache page, Is this 
> the right
>      thing to do as we might end up in data discrepency?

It would be weird if

a) The src used/modified the file and fsync'ed the modifications back to
   backing storage
b) The dst has stale dirty pagecache pages that would result in a
   modification of backing storage during fsync()

I mean, that would be fundamentally broken, because the fsync() would
corrupt the file. So I assume in a sane environment, the dst could only
have stale clean pagecache pages. And we'd have to get rid of these to
re-read everything from file.

IIRC, an existing mmap of the file on the dst should not really be
problematic *as long as* we didn't actually access file content that way
and faulted in the pages. So *maybe*, if we do the POSIX_FADV_DONTNEED
on the dst before accessing file content via the mmap, there shouldn't
be an issue. Unless the mmap itself is already problematic.

I think we can assume that once QEMU starts on the dst and wants to mmap
the file that it's not mapped into any other process yet. vhost-user
will only mmap *after* being told from QEMU about the mmap region and
the location in GPA.

So if the existing QEMU mmap is not problematic, it should be easy, just
do the POSIX_FADV_DONTNEED on the destination when initializing
virtio-pmem. If we have to POSIX_FADV_DONTNEED *before* performing the
mmap, we might need a way to tell QEMU to POSIX_FADV_DONTNEED before
doing the mmap. The could be a parameter for memory-backend-file like
"flush=on", or doing that implicitly when we're told that we expect an
incoming migration.

-- 
Thanks,

David / dhildenb


Reply via email to