live-migration performance regression when using pmem

Chaney, Ben Mon, 12 May 2025 08:17:50 -0700

Hello,

        When live migrating to a destination host with pmem there is a very 
long downtime where the guest is paused. In some cases, this can be as high as 
5 minutes, compared to less than one second in the good case.



        Profiling suggests very high activity in this code path:


ffffffffa2956de6 clean_cache_range+0x26 ([kernel.kallsyms])
ffffffffa2359b0f dax_writeback_mapping_range+0x1ef ([kernel.kallsyms])
ffffffffc0c6336d ext4_dax_writepages+0x7d ([kernel.kallsyms])
ffffffffa2242dac do_writepages+0xbc ([kernel.kallsyms])
ffffffffa2235ea6 filemap_fdatawrite_wbc+0x66 ([kernel.kallsyms])
ffffffffa223a896 __filemap_fdatawrite_range+0x46 ([kernel.kallsyms])
ffffffffa223af73 file_write_and_wait_range+0x43 ([kernel.kallsyms])
ffffffffc0c57ecb ext4_sync_file+0xfb ([kernel.kallsyms])
ffffffffa228a331 __do_sys_msync+0x1c1 ([kernel.kallsyms])
ffffffffa2997fe6 do_syscall_64+0x56 ([kernel.kallsyms])
ffffffffa2a00126 entry_SYSCALL_64_after_hwframe+0x6e ([kernel.kallsyms])
11ec5f msync+0x4f (/usr/lib/x86_64-linux-gnu/libc.so.6)
675ada qemu_ram_msync+0x8a (/usr/local/akamai/qemu/bin/qemu-system-x86_64)
6873c7 xbzrle_load_cleanup+0x37 (inlined)
6873c7 ram_load_cleanup+0x37 (/usr/local/akamai/qemu/bin/qemu-system-x86_64)
4ff375 qemu_loadvm_state_cleanup+0x55 
(/usr/local/akamai/qemu/bin/qemu-system-x86_64)
500f0b qemu_loadvm_state+0x15b (/usr/local/akamai/qemu/bin/qemu-system-x86_64)
4ecf85 process_incoming_migration_co+0x95 
(/usr/local/akamai/qemu/bin/qemu-system-x86_64)
8b6412 qemu_coroutine_self+0x2 (/usr/local/akamai/qemu/bin/qemu-system-x86_64)
ffffffffffffffff [unknown] ([unknown])


        I was able to resolve the performance issue by removing the call to 
qemu_ram_block_writeback in ram_load_cleanup. This causes the performance to 
return to normal. It looks like this code path was initially added to ensure 
the memory was synchronized if the persistent memory region is backed by an 
NVDIMM device. Does it serve any purpose if pmem is instead backed by standard 
DRAM?


        I'm also curious about the intended use of this code path in NVDIMM 
case. It seems like it would run into a few issues. This on its own seems 
insufficient to restore the VM state if the host crashes after a live 
migration. The memory region being synced is only the guest memory. It doesn't 
save the driver state on the host side. Also, once the migration completes, the 
guest can redirty the pages. If the host crashes after that point, the guest 
memory will still be in an inconsistent state unless the crash is exceptionally 
well timed. Does anyone have any insight into why this sync operation was 
introduced?


Thank you,
        Ben Chaney

live-migration performance regression when using pmem

Reply via email to