On Friday, April 11, 2025 14:24 CEST, Fabiano Rosas <faro...@suse.de> wrote:
> > If bitmap 0 implies zero page, we could call `ram_handle_zero` > > in `read_ramblock_mapped_ram` for the clear bits. > > Or do you fear this might be unnecessary expensive for migration? > > Yes, unfortunately the peformance difference is noticeable. But we could > have a slightly different algorithm for savevm. At this point it might > be easier to just duplicate read_ramblock_mapped_ram(), check for savevm > in there and see what that the resulting code looks like. I tried to get some numbers in a "bad case" scenario restoring a clean, fully booted, idle Debian VM with 4GB of ram. The zero pages are ~90%. I'm using a nvme ssd to store the snapshot and I repeated the restore 10 times with and without zeroing (`ram_handle_zero`). The restore takes on average +25% of time. (It's not a broad nor deep investigation.) So, I see your point on performance, but I'm not fully comfortable with the difference in zero page handling between mapped-ram on and mapped-ram off. In the former case zero pages are skipped, while in the latter they are explicitly zeroed. Enabling mapped-ram has the implicit effect to also skip zero-pages. I think it is an optimization not really bound to mapped-ram and it could be better to have this feature external to mapped-ram, enabled when the destination ram is known to be already zeroed (also for mapped-ram off ideally). > By the way, what's your overall goal with enabling the feature? Do you > intent to enable further capabilities for snapshot? Specifically > multifd. I belive the zero page skip is responsible for most of the > performance gains for mapped-ram without direct-io and multifd. The > benefit of bounded stream size doesn't apply to snapshots because > they're not live. My overall goal is a hot-loadvm feature that currently lives in a fork downstream and has a long way before getting in a mergeable state :) In a nutshell, I'm using dirty page tracking to load from the snapshot only the pages that have been dirtied between two loadvm; mapped-ram is required to seek and read only the dirtied pages. About the other capabilities, I still have to understand if they might help in my use case. > It would be interesting to gather some numbers for the perf difference > between mapped-ram=on vs off. Repeating the same experiment as above, without mapped-ram, I obtain +48% in restore time compared to mapped-ram and, therefore, a +18% wrt to the mapped-ram with zeroing. (It should be noted that mapped-ram without zeroing leaves the restored vm in an inconsistent state). At the moment I don't have numbers regarding savevm. Thanks! Best, Marco