On November 25, 2025 7:17 pm, Thomas Lamprecht wrote:
> Am 25.11.25 um 18:21 schrieb Aaron Lauterer:
>> On  2025-11-25  16:20, Thomas Lamprecht wrote:
>>> Am 25.11.25 um 15:20 schrieb Fabian Grünbichler:
>>>> On November 25, 2025 3:08 pm, Thomas Lamprecht wrote:
>>>>> Just to be sure: The stats from memory.current or memory.stat inside the
>>>>> /sys/fs/cgroup/qemu.slice/${vmid}.scope/ directory is definitively not
>>>>> enough for our usecases?
>>>>
>>>> well, if we go for RSS they might be, for PSS they are not, since that
>>>> doesn't exist there?
>>>
>>> Would need to take a closer look to tell for sure, but from a quick check
>>> it indeed seems to not be there.
>>>
>>>> having the live view and the metrics use different semantics seems kinda
>>>> confusing tbh..
>>>
>>> more than jumping between metrics over time silently? ;-) The live view can
>>> be easily annotated with a different label or the like if the source is
>>> another, not so easy for metrics.
>>>
>>> The more I think about this the more I'm in favor of just deprecating this
>>> again completely, this page table walking can even cause some latency spikes
>>> in the target process, IMO just not worth it. If the kernel can give us this
>>> free, or at least much cheaper, in the future, then great, but until then 
>>> it's
>>> not really an option. If, we can make this opt-in. The best granularity here
>>> probably would be through guest config, but for starters a cluster-wide
>>> datacenter option could be already enough for the setups that are fine with
>>> this performance trade-off in general.
>> 
>> 
>> If I may add my 2 cents here. How much do we lose by switching completely to 
>> fetching RSS (or the cgroupv2 equivalent)? For the metrics and live view.
>> AFAIU the resulting memory accounting will be a bit higher, as shared 
>> libraries will be fully accounted for for each cgroup and not proportionally 
>> as with PSS.
> 
> You account shared memory more than once for if a user checks each VM and sums
> them up, i.e. the total can even come out for more than installed memory.
> IME this confuses people more compared to over time effects, but no hard
> feelings here as long as it's clear that it's now ignoring if any memory is
> shared between multiple processes/VMs.
> 
>> I am not sure if we want to introduce additional config options (global per 
>> DC, or per guest) to change the behavior. As that is probably even more 
>> confusing for not that much gain.
> 
> I have no problem of ripping this out, that was just a proposal for a cheap 
> way
> to allow keeping this behavior for those that really want it.
> 
>> And its not like PSS doesn't come with its own set of weirdness. E.g., if we 
>> run 10 VMs, and stop all but one, the last will see an increase in memory 
>> consumption as it is the sole user of shared libraries.
> 
> Yes, that's the basic underlying principle and how reality with accounting 
> works.
> At any time the value is correct though, while with RSS it's wrong at any 
> time.
> 
> For KSM you have already similar effects, from the POV of a VM the memory 
> usage
> can stay the same, but due to change what's in the memory the KSM sharing rate
> goes down and thus memory usage on the host goes up even if all VMs kept 
> exactly
> the same amount of memory in use. If you start to share things the usage stats
> will always stop being trivial.

yes. without KSM the difference between RSS and PSS for VMs shouldn't be
too big (after all, VMs usually have a lot more virtual RAM than loaded
executables/libraries ;)). with KSM the difference can become massive,
but since we cannot really query PSS then anymore, that point is moot.

so yeah, I will send a version that just goes back to RSS.. and we
should document that the host memory usage doesn't take KSM into
account, to reduce the confusion.


_______________________________________________
pve-devel mailing list
[email protected]
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Reply via email to