This "anon" memory cannot be "shared" with other processes, but
virtio-kernel etc. can just use it.

To "share" the memory with other processes, we'd need memfd/file.

Ah OK, thanks David.  Is this the planned long term solution for
vhost-kernel?

I think the basic idea was that the memory backend defines how the "non-private" memory is backed, which is the same just like for any other non-CC VM.

The "private" memory always comes from guest_memfd.

So for the time being using anon+guest_memfd coresponds to "just a simple VM".

Long-term I expect that we use guest_memfd for shared+private, and use in-place conversion. Access to "private" memory using the mmap() will result in a SIGBUS.

> > I wonder what happens if vhost tries to DMA to a region that is private
with this setup.
> > AFAIU, it'll try to DMA to the fake address of ramblock->host that is
pointing to by the memory backend (either anon, shmem, file, etc.).  The
ideal case IIUC is it should crash QEMU because it's trying to access an
illegal page which is private. But if with this model, it won't crash but
silently populate some page in the non-gmemfd backend.

Is that expected?

Yes, it's all just a big mmap() which will populate memory on access -- independent of using anon/file/memfd.

Similar to virtio-mem, long-term we'd want a mechanism to check/enforce that some memory in there will not be populated on access from QEMU (well, and vhost-user processes ...).

In memory_get_xlat_addr() we perform such checks, but it's only used for iommu. vhost-kernel likely has no such checks, just like vhost-user etc does not.




When specified gmemfd=on with those, IIUC it'll allocate both the memory
(ramblock->host) and gmemfd, but without using ->host.  Meanwhile AFAIU the
ramblock->host will start to conflict with gmemfd in the future when it
might be able to be mapp-able (having valid ->host).

These will require a new guest_memfd memory backend (I recall that was
discussed a couple of times).

Do you know if anyone is working on this one?

So far my understanding is that Google that does shared+private guest_memfd kernel part won't be working on QEMU patches. I raised that to our management recently, that this would be a good project for RH to focus on.

I am not aware of real implementations of the guest_memfd backend (yet).




I have a local fix for this (and actually more than below.. but starting
from it), I'm not sure whether I overlooked something, but from reading the
cover letter it's only using memfd backend which makes perfect sense to me
so far.

Does the anon+guest_memfd combination not work or are you speculating about
the usability (which I hopefully addressed above).

IIUC, if with above solution and with how QEMU interacts memory convertions
right now, at least hugetlb pages will suffer from double allocation, as
kvm_convert_memory() won't free hugetlb pages even if converted to private.

Yes, that's why I'm invested in teaching guest_memfd in-place conversion alongside huge page support (which fortunately Google engineers are doing great work on).


It sounds like also doable (and also preferrable..) that for each of the VM
we always stich with pages in the gmemfd page cache, no matter if it's
shared or private.  For private, we could zap all pgtables and sigbus any
faults afterwards.  I thought that was always the plan, but I could lose
many latest informations..

Yes, with the guest_memfd backend (shared+private) that's the plan: SIGBUS on invalid access.


--
Cheers,

David / dhildenb


Reply via email to