Re: [PATCH v3 07/49] HostMem: Add mechanism to opt in kvm guest memfd via MachineState

David Hildenbrand Tue, 21 Jan 2025 12:43:11 -0800

This "anon" memory cannot be "shared" with other processes, but
virtio-kernel etc. can just use it.


To "share" the memory with other processes, we'd need memfd/file.


Ah OK, thanks David.  Is this the planned long term solution for
vhost-kernel?

I think the basic idea was that the memory backend defines how the"non-private" memory is backed, which is the same just like for anyother non-CC VM.


The "private" memory always comes from guest_memfd.

So for the time being using anon+guest_memfd coresponds to "just asimple VM".

Long-term I expect that we use guest_memfd for shared+private, and usein-place conversion. Access to "private" memory using the mmap() willresult in a SIGBUS.


> > I wonder what happens if vhost tries to DMA to a region that is private

with this setup.

> > AFAIU, it'll try to DMA to the fake address of ramblock->host that is

pointing to by the memory backend (either anon, shmem, file, etc.).  The
ideal case IIUC is it should crash QEMU because it's trying to access an
illegal page which is private. But if with this model, it won't crash but
silently populate some page in the non-gmemfd backend.

Is that expected?

Yes, it's all just a big mmap() which will populate memory on access --independent of using anon/file/memfd.

Similar to virtio-mem, long-term we'd want a mechanism to check/enforcethat some memory in there will not be populated on access from QEMU(well, and vhost-user processes ...).

In memory_get_xlat_addr() we perform such checks, but it's only used foriommu. vhost-kernel likely has no such checks, just like vhost-user etcdoes not.


When specified gmemfd=on with those, IIUC it'll allocate both the memory
(ramblock->host) and gmemfd, but without using ->host.  Meanwhile AFAIU the
ramblock->host will start to conflict with gmemfd in the future when it
might be able to be mapp-able (having valid ->host).


These will require a new guest_memfd memory backend (I recall that was
discussed a couple of times).


Do you know if anyone is working on this one?

So far my understanding is that Google that does shared+privateguest_memfd kernel part won't be working on QEMU patches. I raised thatto our management recently, that this would be a good project for RH tofocus on.


I am not aware of real implementations of the guest_memfd backend (yet).


I have a local fix for this (and actually more than below.. but starting
from it), I'm not sure whether I overlooked something, but from reading the
cover letter it's only using memfd backend which makes perfect sense to me
so far.


Does the anon+guest_memfd combination not work or are you speculating about
the usability (which I hopefully addressed above).


IIUC, if with above solution and with how QEMU interacts memory convertions
right now, at least hugetlb pages will suffer from double allocation, as
kvm_convert_memory() won't free hugetlb pages even if converted to private.

Yes, that's why I'm invested in teaching guest_memfd in-place conversionalongside huge page support (which fortunately Google engineers aredoing great work on).


It sounds like also doable (and also preferrable..) that for each of the VM
we always stich with pages in the gmemfd page cache, no matter if it's
shared or private.  For private, we could zap all pgtables and sigbus any
faults afterwards.  I thought that was always the plan, but I could lose
many latest informations..

Yes, with the guest_memfd backend (shared+private) that's the plan:SIGBUS on invalid access.



--
Cheers,

David / dhildenb

Re: [PATCH v3 07/49] HostMem: Add mechanism to opt in kvm guest memfd via MachineState

Reply via email to