Re: [PATCH V2 01/13] machine: alloc-anon option

Peter Xu Mon, 07 Oct 2024 12:06:27 -0700

On Mon, Oct 07, 2024 at 06:23:26PM +0200, David Hildenbrand wrote:
> > Yes I thought it could be unconditionally.  We can discuss downside below,
> > I think we can still use a new flag otherwise, but the idea would be the
> > same, where I want the flag to be explicit in the callers not implicitly
> > with the object type check, which I think can be hackish.
> 
> I agree that the caller should specify it.
> 
> But I don't think using shared memory where shared memory is not warranted
> is a reasonable approach.
> 
> I'm quite surprise you're considering such changes with unclear impacts on
> other OSes besides Linux (Freedbsd? Windows that doeasn';t even support
> shared memory?) just to make one corner-case QEMU use case happy.
> 
> But I'm sure there are valid reasons why you had that idea, so I'm happy to
> learn why using shared memory unconditionally here is better than providing
> a clean alternative path with the feature enabled and memfd actually being
> supported on the setup (e.g., newer Linux kernel).


I was thinking whether cpr-transfer can be enabled by default, so whenever
people want to use it, no code reset needed.  It's also easier for Libvirt
to not need to add yet another machine flags if possible.

Currently this parameter is the only one left that needs to be manually
enabled on src.  It means if we can get rid of it then any QEMU based VM on
Linux can do a cpr-transfer any time as long as QEMU supports it.

Without it, this new parameter will need to be manually enabled otherwise
another system code reboot / live migration needs to happen first without
CPR, just to enable this flag.

But yeah I don't think I think it all through, so I left my pure question.
I think it looks still like an option, the other option if we still want to
enable it by default is, keep the option, then only enable it on new
machines that is based on Linux.

OS dependency is definitely an issue.  AFAICT CPR is only available for
Linux anyway, but I'm happy to be corrected..  IOW, those chunk of new code
(if only unconditionally done..) would need proper #ifdef, so that
non-Linux OSes work like before.

> 
> > 
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > I think RAM_SHARED can actually be that flag already - I mean, in 
> > > > > > all paths
> > > > > > that we may create anon mem (but not memory-backend-* objects), is 
> > > > > > it
> > > > > > always safe we always switch to RAM_SHARED from anon?
> > > > > 
> > > > > Do you mean only setting the flag (-> anonymous shmem) or switching 
> > > > > also to
> > > > > memfd, which is a bigger change?
> > > > 
> > > > Switching to memfd.  I thought anon shmem (mmap(MAP_SHARED)) is mostly 
> > > > the
> > > > same internally, if we create memfd then mmap(MAP_SHARED) on top of it, 
> > > > no?
> > > 
> > > Memfd is Linux specific, keep that in mind. Apart from that there 
> > > shouldn't
> > > be much difference between anon shmem and memfd (there are memory commit
> > > differences, though).
> > 
> > Could you elaborate the memory commit difference and what does that imply
> > to QEMU's usage?
> 
> Note how memfd code passed VM_NORESERVE to shmem_file_setup() and
> shmem_zero_setup() effectively doesn't (unless MAP_NORESERVE was specified
> IIRC).
> 
> Not sure if the change makes a big impact in QEMU's usage, it's just one of
> these differences between memfd and shared anonymous memory. (responding to
> your "mostly the same").

So yeah, I hoped the memory commit won't be a problem, because I think they
should be corner case MRs, and should be small.

Vram can take up to 16MB, that's the max I'm aware of, but indeed I don't
know all the use cases to be sure.  I think it means some tens of MBs can
be accounted later during fault rather than upfront to fail QEMU from boot.
Ideally mgmt apps should leave enough space for these ones, but if we worry
on that we can stick with the current option (but create a new flag besides
RAM_SHARED).

> 
> > 
> > > 
> > > Of course, there is a difference between anon memory and shmem, for 
> > > example
> > > regarding what viritofsd faced (e.g., KSM) recently.
> > 
> > The four paths shouldn't be KSM target, AFAICT.
> 
> Do you have a good overview of what is deduplicated in practice and why
> these don't apply? For example, I thought these functions are also used for
> hosting the BIOS, and that might just be deduplciated between VMs?

I was thinking KSM was for merging OS/App pages (rather than BIOSs, which
are normally very, very small)?  Though I could be wrong.

> 
> Anyhow, there are obviously other differences with shmem vs. anonymous (THP
> handling, page fault performance, userfaultfd compatibility on older
> kernels) at least on Linux, but I have absolutely no clue how that would
> differ on other host OSes.
> 
> None of them are major
> 
> This is probably going to result in a bigger discussion, for which I don't
> have any time. So my opinion on it is above.
> 
> Anyhow, this sounds like one of the suggestions I wouldn't suggest Steve to
> actually implement.

We don't need to make it a bigger discussion.  If there's concern with it,
we can stick with a new flag.

The next question is whether if with a new flag we should enable it by
default sometimes (e.g. on new machine types on Linux).  But when with a
new option, that can be discussed later.

Thanks,

-- 
Peter Xu

Re: [PATCH V2 01/13] machine: alloc-anon option

Reply via email to