On 31.12.21 13:01, Pankaj Gupta wrote: > From: Pankaj Gupta <pankaj.gupta.li...@gmail.com>> > > Enable live migration support for virtio-pmem device. > Tested this: with live migration on same host. > > Need suggestion on below points to support virtio-pmem live migration > between two separate host systems:
I assume emulated NVDIMMs would have the exact same issue, right? There are two cases to consider I think: 1) Backing storage is migrated manually to the destination (i.e., a file that is copied/moved/transmitted during migration) 2) Backing storage is located on a shared network storage (i.e., a file that is not copied during migration) IIRC you're concerned about 2). > > - There is still possibility of stale page cache page at the > destination host which we cannot invalidate currently as done in 1] > for write-back mode because virtio-pmem memory backend file is mmaped > in guest address space and invalidating corresponding page cache pages > would also fault all the other userspace process mappings on the same file. > Or we make it strict no other process would mmap this backing file? I'd have assume that a simple fsync on the src once migration is about to switch over (e.g., pre_save/post_save handler) should be enough to trigger writeback to the backing storage, at which point the dst can take over. So handling the src is easy. So is the issue that the dst might still have stale pagecache information, because it already accessed some of that file previously, correct? > > -- In commit 1] we first fsync and then invalidate all the pages from > destination > page cache. fsync would sync the stale dirty page cache page, Is this > the right > thing to do as we might end up in data discrepency? It would be weird if a) The src used/modified the file and fsync'ed the modifications back to backing storage b) The dst has stale dirty pagecache pages that would result in a modification of backing storage during fsync() I mean, that would be fundamentally broken, because the fsync() would corrupt the file. So I assume in a sane environment, the dst could only have stale clean pagecache pages. And we'd have to get rid of these to re-read everything from file. IIRC, an existing mmap of the file on the dst should not really be problematic *as long as* we didn't actually access file content that way and faulted in the pages. So *maybe*, if we do the POSIX_FADV_DONTNEED on the dst before accessing file content via the mmap, there shouldn't be an issue. Unless the mmap itself is already problematic. I think we can assume that once QEMU starts on the dst and wants to mmap the file that it's not mapped into any other process yet. vhost-user will only mmap *after* being told from QEMU about the mmap region and the location in GPA. So if the existing QEMU mmap is not problematic, it should be easy, just do the POSIX_FADV_DONTNEED on the destination when initializing virtio-pmem. If we have to POSIX_FADV_DONTNEED *before* performing the mmap, we might need a way to tell QEMU to POSIX_FADV_DONTNEED before doing the mmap. The could be a parameter for memory-backend-file like "flush=on", or doing that implicitly when we're told that we expect an incoming migration. -- Thanks, David / dhildenb