On Tue, Feb 04, 2025 at 11:23:41AM +0300, Daniil Tatianin wrote: > > On 1/23/25 7:31 PM, Peter Xu wrote: > > On Thu, Jan 23, 2025 at 04:19:40PM +0300, Daniil Tatianin wrote: > > > Currently, passing mem-lock=on to QEMU causes memory usage to grow by > > > huge amounts: > > > > > > no memlock: > > > $ ./qemu-system-x86_64 -overcommit mem-lock=off > > > $ ps -p $(pidof ./qemu-system-x86_64) -o rss= > > > 45652 > > > > > > $ ./qemu-system-x86_64 -overcommit mem-lock=off -enable-kvm > > > $ ps -p $(pidof ./qemu-system-x86_64) -o rss= > > > 39756 > > > > > > memlock: > > > $ ./qemu-system-x86_64 -overcommit mem-lock=on > > > $ ps -p $(pidof ./qemu-system-x86_64) -o rss= > > > 1309876 > > > > > > $ ./qemu-system-x86_64 -overcommit mem-lock=on -enable-kvm > > > $ ps -p $(pidof ./qemu-system-x86_64) -o rss= > > > 259956 > > > > > > This is caused by the fact that mlockall(2) automatically > > > write-faults every existing and future anonymous mappings in the > > > process right away. > > > > > > One of the reasons to enable mem-lock is to protect a QEMU process' > > > pages from being compacted and migrated by kcompactd (which does so > > > by messing with a live process page tables causing thousands of TLB > > > flush IPIs per second) basically stealing all guest time while it's > > > active. > > > > > > mem-lock=on helps against this (given compact_unevictable_allowed is 0), > > > but the memory overhead it introduces is an undesirable side effect, > > > which we can completely avoid by passing MCL_ONFAULT to mlockall, which > > > is what this series allows to do with a new option for mem-lock called > > > on-fault. > > > > > > memlock-onfault: > > > $ ./qemu-system-x86_64 -overcommit mem-lock=on-fault > > > $ ps -p $(pidof ./qemu-system-x86_64) -o rss= > > > 54004 > > > > > > $ ./qemu-system-x86_64 -overcommit mem-lock=on-fault -enable-kvm > > > $ ps -p $(pidof ./qemu-system-x86_64) -o rss= > > > 47772 > > > > > > You may notice the memory usage is still slightly higher, in this case > > > by a few megabytes over the mem-lock=off case. I was able to trace this > > > down to a bug in the linux kernel with MCL_ONFAULT not being honored for > > > the early process heap (with brk(2) etc.) so it is still write-faulted in > > > this case, but it's still way less than it was with just the mem-lock=on. > > > > > > Changes since v1: > > > - Don't make a separate mem-lock-onfault, add an on-fault option to > > > mem-lock instead > > > > > > Changes since v2: > > > - Move overcommit option parsing out of line > > > - Make enable_mlock an enum instead > > > > > > Changes since v3: > > > - Rebase to latest master due to the recent sysemu -> system renames > > > > > > Daniil Tatianin (4): > > > os: add an ability to lock memory on_fault > > > system/vl: extract overcommit option parsing into a helper > > > system: introduce a new MlockState enum > > > overcommit: introduce mem-lock=on-fault > > > > > > hw/virtio/virtio-mem.c | 2 +- > > > include/system/os-posix.h | 2 +- > > > include/system/os-win32.h | 3 ++- > > > include/system/system.h | 12 ++++++++- > > > migration/postcopy-ram.c | 4 +-- > > > os-posix.c | 10 ++++++-- > > > qemu-options.hx | 14 +++++++---- > > > system/globals.c | 12 ++++++++- > > > system/vl.c | 52 +++++++++++++++++++++++++++++++-------- > > > 9 files changed, 87 insertions(+), 24 deletions(-) > > Considering it's very mem relevant change and looks pretty benign.. I can > > pick this if nobody disagrees (or beats me to it, which I'd appreciate). > > > > I'll also provide at least one week for people to stop me. > > I think it's been almost two weeks, so should be good now :)
Don't worry, this is in track. I'll send it maybe in a few days. Thanks, -- Peter Xu