Currently, passing mem-lock=on to QEMU causes memory usage to grow by huge amounts:
no memlock: $ ./qemu-system-x86_64 -overcommit mem-lock=off $ ps -p $(pidof ./qemu-system-x86_64) -o rss= 45652 $ ./qemu-system-x86_64 -overcommit mem-lock=off -enable-kvm $ ps -p $(pidof ./qemu-system-x86_64) -o rss= 39756 memlock: $ ./qemu-system-x86_64 -overcommit mem-lock=on $ ps -p $(pidof ./qemu-system-x86_64) -o rss= 1309876 $ ./qemu-system-x86_64 -overcommit mem-lock=on -enable-kvm $ ps -p $(pidof ./qemu-system-x86_64) -o rss= 259956 This is caused by the fact that mlockall(2) automatically write-faults every existing and future anonymous mappings in the process right away. One of the reasons to enable mem-lock is to protect a QEMU process' pages from being compacted and migrated by kcompactd (which does so by messing with a live process page tables causing thousands of TLB flush IPIs per second) basically stealing all guest time while it's active. mem-lock=on helps against this (given compact_unevictable_allowed is 0), but the memory overhead it introduces is an undesirable side effect, which we can completely avoid by passing MCL_ONFAULT to mlockall, which is what this series allows to do with a new option for mem-lock called on-fault. memlock-onfault: $ ./qemu-system-x86_64 -overcommit mem-lock=on-fault $ ps -p $(pidof ./qemu-system-x86_64) -o rss= 54004 $ ./qemu-system-x86_64 -overcommit mem-lock=on-fault -enable-kvm $ ps -p $(pidof ./qemu-system-x86_64) -o rss= 47772 You may notice the memory usage is still slightly higher, in this case by a few megabytes over the mem-lock=off case. I was able to trace this down to a bug in the linux kernel with MCL_ONFAULT not being honored for the early process heap (with brk(2) etc.) so it is still write-faulted in this case, but it's still way less than it was with just the mem-lock=on. Changes since v1: - Don't make a separate mem-lock-onfault, add an on-fault option to mem-lock instead Changes since v2: - Move overcommit option parsing out of line - Make enable_mlock an enum instead Changes since v3: - Rebase to latest master due to the recent sysemu -> system renames Changes since v4: - Fix compile errors under FreeBSD and MacOS Daniil Tatianin (4): os: add an ability to lock memory on_fault system/vl: extract overcommit option parsing into a helper system: introduce a new MlockState enum overcommit: introduce mem-lock=on-fault hw/virtio/virtio-mem.c | 2 +- include/system/os-posix.h | 2 +- include/system/os-win32.h | 3 ++- include/system/system.h | 12 ++++++++- meson.build | 6 +++++ migration/postcopy-ram.c | 4 +-- os-posix.c | 14 +++++++++-- qemu-options.hx | 14 +++++++---- system/globals.c | 12 ++++++++- system/vl.c | 52 +++++++++++++++++++++++++++++++-------- 10 files changed, 97 insertions(+), 24 deletions(-) -- 2.34.1