On Tue, May 13, 2025 at 03:48:06PM +0000, Chaney, Ben wrote: > On 5/12/25, 2:50 PM, "Peter Xu" <pet...@redhat.com > <mailto:pet...@redhat.com>> wrote: > > > > What you said makes sense to me, but I'm neither pmem user nor > > expert. Let's wait to see whether others would like to chime in. > > > > What's the first bad commit of the regression? Is it since v10.0 release? > > Hi Peter, > We are still on an old branch (7.2). The issue began when we enabled > pmem, not as the result of a code change.
OK. Then I think it's not strictly a regression, as it may have been like that forever. I do see that qemu_ram_msync() has this anyway: #ifdef CONFIG_LIBPMEM /* The lack of support for pmem should not block the sync */ if (ramblock_is_pmem(block)) { void *addr = ramblock_ptr(block, start); pmem_persist(addr, length); return; } #endif Does it mean that you're using pmem but without libpmem compiled? From your stack dump, it looks like msync() is triggered and I would expect that won't happen if the ramblock in question is pmem. Is your case using DRAM as the backing storage (in form of DAX) for the ext4 file, while exposed as a pmem to the guest? I'd expect if at least with above check pass then pmem_persist() would be faster, though I don't know how much. It looks still reasonable for QEMU to always sync here if it's pmem then, because qemu still sees this ramblock a persist storage, and after migration qemu wants to make sure all things are persisted. Said that, I wonder if David was right in the other email that we still have some regression and at least migration should skip the sync for !pmem, that is: diff --git a/migration/ram.c b/migration/ram.c index d26dbd37c4..a93da18842 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -3672,7 +3672,9 @@ static int ram_load_cleanup(void *opaque) RAMBlock *rb; RAMBLOCK_FOREACH_NOT_IGNORED(rb) { - qemu_ram_block_writeback(rb); + if (ramblock_is_pmem(block)) { + qemu_ram_block_writeback(rb); + } } xbzrle_load_cleanup(); But if you're using a real pmem ramblock, it shouldn't affect your use case. Thanks, -- Peter Xu