Re: [pve-devel] [RFC qemu-server] fix #6935: vmstatus: fallback to RSS in case of KSM usage

Thomas Lamprecht Tue, 25 Nov 2025 10:17:56 -0800

Am 25.11.25 um 18:21 schrieb Aaron Lauterer:
> On  2025-11-25  16:20, Thomas Lamprecht wrote:
>> Am 25.11.25 um 15:20 schrieb Fabian Grünbichler:
>>> On November 25, 2025 3:08 pm, Thomas Lamprecht wrote:
>>>> Just to be sure: The stats from memory.current or memory.stat inside the
>>>> /sys/fs/cgroup/qemu.slice/${vmid}.scope/ directory is definitively not
>>>> enough for our usecases?
>>>
>>> well, if we go for RSS they might be, for PSS they are not, since that
>>> doesn't exist there?
>>
>> Would need to take a closer look to tell for sure, but from a quick check
>> it indeed seems to not be there.
>>
>>> having the live view and the metrics use different semantics seems kinda
>>> confusing tbh..
>>
>> more than jumping between metrics over time silently? ;-) The live view can
>> be easily annotated with a different label or the like if the source is
>> another, not so easy for metrics.
>>
>> The more I think about this the more I'm in favor of just deprecating this
>> again completely, this page table walking can even cause some latency spikes
>> in the target process, IMO just not worth it. If the kernel can give us this
>> free, or at least much cheaper, in the future, then great, but until then 
>> it's
>> not really an option. If, we can make this opt-in. The best granularity here
>> probably would be through guest config, but for starters a cluster-wide
>> datacenter option could be already enough for the setups that are fine with
>> this performance trade-off in general.
> 
> 
> If I may add my 2 cents here. How much do we lose by switching completely to 
> fetching RSS (or the cgroupv2 equivalent)? For the metrics and live view.
> AFAIU the resulting memory accounting will be a bit higher, as shared 
> libraries will be fully accounted for for each cgroup and not proportionally 
> as with PSS.


You account shared memory more than once for if a user checks each VM and sums
them up, i.e. the total can even come out for more than installed memory.
IME this confuses people more compared to over time effects, but no hard
feelings here as long as it's clear that it's now ignoring if any memory is
shared between multiple processes/VMs.

> I am not sure if we want to introduce additional config options (global per 
> DC, or per guest) to change the behavior. As that is probably even more 
> confusing for not that much gain.

I have no problem of ripping this out, that was just a proposal for a cheap way
to allow keeping this behavior for those that really want it.

> And its not like PSS doesn't come with its own set of weirdness. E.g., if we 
> run 10 VMs, and stop all but one, the last will see an increase in memory 
> consumption as it is the sole user of shared libraries.

Yes, that's the basic underlying principle and how reality with accounting 
works.
At any time the value is correct though, while with RSS it's wrong at any time.

For KSM you have already similar effects, from the POV of a VM the memory usage
can stay the same, but due to change what's in the memory the KSM sharing rate
goes down and thus memory usage on the host goes up even if all VMs kept exactly
the same amount of memory in use. If you start to share things the usage stats
will always stop being trivial.


_______________________________________________
pve-devel mailing list
[email protected]
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Re: [pve-devel] [RFC qemu-server] fix #6935: vmstatus: fallback to RSS in case of KSM usage

Reply via email to