On Tue, Feb 04, 2025 at 11:23:41AM +0300, Daniil Tatianin wrote:
> 
> On 1/23/25 7:31 PM, Peter Xu wrote:
> > On Thu, Jan 23, 2025 at 04:19:40PM +0300, Daniil Tatianin wrote:
> > > Currently, passing mem-lock=on to QEMU causes memory usage to grow by
> > > huge amounts:
> > > 
> > > no memlock:
> > >      $ ./qemu-system-x86_64 -overcommit mem-lock=off
> > >      $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
> > >      45652
> > > 
> > >      $ ./qemu-system-x86_64 -overcommit mem-lock=off -enable-kvm
> > >      $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
> > >      39756
> > > 
> > > memlock:
> > >      $ ./qemu-system-x86_64 -overcommit mem-lock=on
> > >      $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
> > >      1309876
> > > 
> > >      $ ./qemu-system-x86_64 -overcommit mem-lock=on -enable-kvm
> > >      $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
> > >      259956
> > > 
> > > This is caused by the fact that mlockall(2) automatically
> > > write-faults every existing and future anonymous mappings in the
> > > process right away.
> > > 
> > > One of the reasons to enable mem-lock is to protect a QEMU process'
> > > pages from being compacted and migrated by kcompactd (which does so
> > > by messing with a live process page tables causing thousands of TLB
> > > flush IPIs per second) basically stealing all guest time while it's
> > > active.
> > > 
> > > mem-lock=on helps against this (given compact_unevictable_allowed is 0),
> > > but the memory overhead it introduces is an undesirable side effect,
> > > which we can completely avoid by passing MCL_ONFAULT to mlockall, which
> > > is what this series allows to do with a new option for mem-lock called
> > > on-fault.
> > > 
> > > memlock-onfault:
> > >      $ ./qemu-system-x86_64 -overcommit mem-lock=on-fault
> > >      $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
> > >      54004
> > > 
> > >      $ ./qemu-system-x86_64 -overcommit mem-lock=on-fault -enable-kvm
> > >      $ ps -p $(pidof ./qemu-system-x86_64) -o rss=
> > >      47772
> > > 
> > > You may notice the memory usage is still slightly higher, in this case
> > > by a few megabytes over the mem-lock=off case. I was able to trace this
> > > down to a bug in the linux kernel with MCL_ONFAULT not being honored for
> > > the early process heap (with brk(2) etc.) so it is still write-faulted in
> > > this case, but it's still way less than it was with just the mem-lock=on.
> > > 
> > > Changes since v1:
> > >      - Don't make a separate mem-lock-onfault, add an on-fault option to 
> > > mem-lock instead
> > > 
> > > Changes since v2:
> > >      - Move overcommit option parsing out of line
> > >      - Make enable_mlock an enum instead
> > > 
> > > Changes since v3:
> > >      - Rebase to latest master due to the recent sysemu -> system renames
> > > 
> > > Daniil Tatianin (4):
> > >    os: add an ability to lock memory on_fault
> > >    system/vl: extract overcommit option parsing into a helper
> > >    system: introduce a new MlockState enum
> > >    overcommit: introduce mem-lock=on-fault
> > > 
> > >   hw/virtio/virtio-mem.c    |  2 +-
> > >   include/system/os-posix.h |  2 +-
> > >   include/system/os-win32.h |  3 ++-
> > >   include/system/system.h   | 12 ++++++++-
> > >   migration/postcopy-ram.c  |  4 +--
> > >   os-posix.c                | 10 ++++++--
> > >   qemu-options.hx           | 14 +++++++----
> > >   system/globals.c          | 12 ++++++++-
> > >   system/vl.c               | 52 +++++++++++++++++++++++++++++++--------
> > >   9 files changed, 87 insertions(+), 24 deletions(-)
> > Considering it's very mem relevant change and looks pretty benign.. I can
> > pick this if nobody disagrees (or beats me to it, which I'd appreciate).
> > 
> > I'll also provide at least one week for people to stop me.
> 
> I think it's been almost two weeks, so should be good now :)

Don't worry, this is in track.  I'll send it maybe in a few days.

Thanks,

-- 
Peter Xu


Reply via email to