On Friday, April 11, 2025 14:24 CEST, Fabiano Rosas <faro...@suse.de> wrote:

> > If bitmap 0 implies zero page, we could call `ram_handle_zero`
> > in `read_ramblock_mapped_ram` for the clear bits.
> > Or do you fear this might be unnecessary expensive for migration?
> 
> Yes, unfortunately the peformance difference is noticeable. But we could
> have a slightly different algorithm for savevm. At this point it might
> be easier to just duplicate read_ramblock_mapped_ram(), check for savevm
> in there and see what that the resulting code looks like.

I tried to get some numbers in a "bad case" scenario restoring a
clean, fully booted, idle Debian VM with 4GB of ram. The zero pages
are ~90%. I'm using a nvme ssd to store the snapshot and I repeated
the restore 10 times with and without zeroing (`ram_handle_zero`).
The restore takes on average +25% of time.
(It's not a broad nor deep investigation.)

So, I see your point on performance, but I'm not fully comfortable
with the difference in zero page handling between mapped-ram on
and mapped-ram off. In the former case zero pages are skipped, while
in the latter they are explicitly zeroed.
Enabling mapped-ram has the implicit effect to also skip zero-pages.
I think it is an optimization not really bound to mapped-ram and it
could be better to have this feature external to mapped-ram, enabled
when the destination ram is known to be already zeroed (also for
mapped-ram off ideally).

> By the way, what's your overall goal with enabling the feature? Do you
> intent to enable further capabilities for snapshot? Specifically
> multifd. I belive the zero page skip is responsible for most of the
> performance gains for mapped-ram without direct-io and multifd. The
> benefit of bounded stream size doesn't apply to snapshots because
> they're not live.

My overall goal is a hot-loadvm feature that currently lives in a fork
downstream and has a long way before getting in a mergeable state :)
In a nutshell, I'm using dirty page tracking to load from the snapshot
only the pages that have been dirtied between two loadvm;
mapped-ram is required to seek and read only the dirtied pages.
About the other capabilities, I still have to understand if they might
help in my use case.

> It would be interesting to gather some numbers for the perf difference
> between mapped-ram=on vs off.

Repeating the same experiment as above, without mapped-ram,
I obtain +48% in restore time compared to mapped-ram and,
therefore, a +18% wrt to the mapped-ram with zeroing.
(It should be noted that mapped-ram without zeroing leaves
the restored vm in an inconsistent state).
At the moment I don't have numbers regarding savevm.

Thanks!

Best,
Marco


Reply via email to