On 2/4/25 5:47 PM, Peter Xu wrote:

On Tue, Feb 04, 2025 at 11:23:41AM +0300, Daniil Tatianin wrote:
On 1/23/25 7:31 PM, Peter Xu wrote:
On Thu, Jan 23, 2025 at 04:19:40PM +0300, Daniil Tatianin wrote:
Currently, passing mem-lock=on to QEMU causes memory usage to grow by
huge amounts:

no memlock:
      $ ./qemu-system-x86_64 -overcommit mem-lock=off
      $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
      45652

      $ ./qemu-system-x86_64 -overcommit mem-lock=off -enable-kvm
      $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
      39756

memlock:
      $ ./qemu-system-x86_64 -overcommit mem-lock=on
      $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
      1309876

      $ ./qemu-system-x86_64 -overcommit mem-lock=on -enable-kvm
      $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
      259956

This is caused by the fact that mlockall(2) automatically
write-faults every existing and future anonymous mappings in the
process right away.

One of the reasons to enable mem-lock is to protect a QEMU process'
pages from being compacted and migrated by kcompactd (which does so
by messing with a live process page tables causing thousands of TLB
flush IPIs per second) basically stealing all guest time while it's
active.

mem-lock=on helps against this (given compact_unevictable_allowed is 0),
but the memory overhead it introduces is an undesirable side effect,
which we can completely avoid by passing MCL_ONFAULT to mlockall, which
is what this series allows to do with a new option for mem-lock called
on-fault.

memlock-onfault:
      $ ./qemu-system-x86_64 -overcommit mem-lock=on-fault
      $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
      54004

      $ ./qemu-system-x86_64 -overcommit mem-lock=on-fault -enable-kvm
      $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
      47772

You may notice the memory usage is still slightly higher, in this case
by a few megabytes over the mem-lock=off case. I was able to trace this
down to a bug in the linux kernel with MCL_ONFAULT not being honored for
the early process heap (with brk(2) etc.) so it is still write-faulted in
this case, but it's still way less than it was with just the mem-lock=on.

Changes since v1:
      - Don't make a separate mem-lock-onfault, add an on-fault option to 
mem-lock instead

Changes since v2:
      - Move overcommit option parsing out of line
      - Make enable_mlock an enum instead

Changes since v3:
      - Rebase to latest master due to the recent sysemu -> system renames

Daniil Tatianin (4):
    os: add an ability to lock memory on_fault
    system/vl: extract overcommit option parsing into a helper
    system: introduce a new MlockState enum
    overcommit: introduce mem-lock=on-fault

   hw/virtio/virtio-mem.c    |  2 +-
   include/system/os-posix.h |  2 +-
   include/system/os-win32.h |  3 ++-
   include/system/system.h   | 12 ++++++++-
   migration/postcopy-ram.c  |  4 +--
   os-posix.c                | 10 ++++++--
   qemu-options.hx           | 14 +++++++----
   system/globals.c          | 12 ++++++++-
   system/vl.c               | 52 +++++++++++++++++++++++++++++++--------
   9 files changed, 87 insertions(+), 24 deletions(-)
Considering it's very mem relevant change and looks pretty benign.. I can
pick this if nobody disagrees (or beats me to it, which I'd appreciate).

I'll also provide at least one week for people to stop me.
I think it's been almost two weeks, so should be good now :)
Don't worry, this is in track.  I'll send it maybe in a few days.

Thanks,

Amazing, thank you!


Reply via email to