Re: [RFC PATCH v2 09/37] KVM: guest_memfd: Add support for KVM_SET_MEMORY_ATTRIBUTES2

Sean Christopherson Thu, 12 Mar 2026 08:45:01 -0700

On Thu, Mar 12, 2026, Fuad Tabba wrote:
> Hi Ackerley,
> 
> Before getting into the UAPI semantics, thank you for all the heavy
> lifting you've done here. Figuring out how to make it all work across
> the different platforms is not easy :)
> 
> <snip>
> 
> > The policy definitions below provide more details:


Please drop "CONTENT_POLICY" from the KVM documentation.  From KVM's 
perspective,
these are not "policy", they are purely properties of the underlying memory.
Userspace will likely use the attributes to implement policy of some kind, but
KVM straight up doesn't care.

> > ``KVM_SET_MEMORY_ATTRIBUTES2_CONTENT_POLICY_ZERO`` (default)

The default behavior absolutely cannot be something that's not supported on
every conversion type.

> >
> >   On a private to shared conversion, the host will read zeros from the
> >   converted memory on the next fault after successful return of the
> >   KVM_SET_MEMORY_ATTRIBUTES2 ioctl.
> >
> >   This is not supported (-EOPNOTSUPP) for a shared to private
> >   conversion. While some CoCo implementations do zero memory contents
> >   such that the guest reads zeros after conversion, the guest is not
> >   expected to trust host-provided zeroing, hence as a UAPI policy, KVM
> >   does not make any such guarantees.
> 
> The rationale for not supporting this in the UAPI isn't quite right
> and I think that the prohibition should be removed. It's true that the
> guest is not expected to trust host-provided zeroing. However, if the
> VMM invokes this ioctl with the ZERO policy, the zeroing is performed
> by the hypervisor, not by the (untrusted) host.

What entity zeros the data doesn't matter as far as KVM's ABI is concerned.  
That's
a motivating favor to providing ZERO, e.g. it allow userspace to elide 
additional
zeroing when it _knows_ the memory holds zeros, but that's orthogonal to KVM's
contract with userspace.

> Although pKVM handles fresh, zeroed memory provisioning via donation
> rather than attribute conversion, stating that the UAPI cannot make
> guarantees due to trust boundaries is incorrect. The hypervisor is

We should avoid using "hypervisor", because (a) it means different things to
different people and (b) even when there's consensus on what "hypervisor" means,
whether or not the hypervisor is trusted varies per implementation.

> need to be careful witho precisely the entity the guest trusts to enforce
> this.
> 
> The UAPI should define the semantics for a shared-to-private ZERO
> conversion, even if current architectures return -EOPNOTSUPP because
> they handle fresh memory provisioning via other mechanisms (like
> pKVM's donation path).
> 
> How about something like the following:
> 
> On a shared to private conversion, the hypervisor will zero the memory

Again, say _nothing_ about "the hypervisor".  _How_ or when anything happens is
completely irrelevant, the only thing that matters here is _what_ happens.

> contents before mapping it into the guest's private address space,
> preventing the untrusted host from injecting arbitrary data into the
> guest. If an architecture handles zeroed-provisioning via mechanisms
> other than attribute conversion, it may return -EOPNOTSUPP.

No.  I am 100% against bleeding vendor specific information into KVM's ABI for
this.  What the vendor code does is irrelevant, the _only_ thing that matters
here is KVM's contract with userspace.

That doesn't mean pKVM guests can't rely on memory being zeroed, but that is a
contract between pKVM and its guests, not between KVM and host userspace.

> >   For testing purposes, the KVM_X86_SW_PROTECTED_VM testing vehicle
> >   will support this policy and ensure zeroing for conversions in both
> >   directions.
> >
> > ``KVM_SET_MEMORY_ATTRIBUTES2_CONTENT_POLICY_PRESERVE``
> >
> >   On private/shared conversions in both directions, memory contents
> >   will be preserved and readable. As a concrete example, if the host
> >   writes ``0xbeef`` to memory and converts the memory to shared, the
> >   guest will also read ``0xbeef``, after any necessary hardware or
> >   software provided decryption. After a reverse shared to private
> >   conversion, the host will also read ``0xbeef``.
> 
> I think that this example is backwards. If the host writes to memory,
> that memory is already shared, isn't it? Converting it to shared is
> redundant. More importantly, if memory undergoes a shared-to-private
> conversion, the host must lose access entirely.

Ya, it's messed up.

> Maybe a clearer example would reflect actual payload injection and
> bounce buffer sharing:
> - Shared-to-Private (Payload Injection): The host writes a payload
> (e.g., 0xbeef) to shared memory and converts it to private. The guest
> reads 0xbeef in its private address space. The host loses access.
> - Private-to-Shared (Bounce Buffer): The guest writes 0xbeef to
> private memory and converts it to shared. The host reads 0xbeef.
> 
> >   pKVM (ARM) is the first user of this policy. Since pKVM does not
> >   protect memory with encryption, a content policy to preserve memory
> >   will not will not involve any decryption. The guest will be able to
> >   read what the host wrote with full content preservation.
> 
> This is correct, but to be precise, I think it should explicitly
> mention Stage-2 page tables as the protection mechanism, maybe:

pKVM shouldn't be mentioned in here at all.

---
By default, KVM makes no guarantees about the in-memory values after memory is
convert to/from shared/private.  Optionally, userspace may instruct KVM to
ensure the contents of memory are zeroed or preserved, e.g. to enable in-place
sharing of data, or as an optimization to avoid having to re-zero memory when
the trusted entity guarantees the memory will be zeroed after conversion.

The behaviors supported by a given KVM instance can be queried via <cap>.  If
the requested behavior is an unsupported, KVM will return -EOPNOTSUPP and
reject the conversion request.  Note!  The "ZERO" request is only support for
private to shared conversion!

``KVM_SET_MEMORY_ATTRIBUTES2_ZERO``

  On conversion, KVM guarantees all entities that have "allowed" access to the
  memory will read zeros.  E.g. on private to shared conversion, both trusted
  and untrusted code will read zeros.

  Zeroing is currently only supported for private-to-shared conversions, as KVM
  in general is untrusted and thus cannot guarantee the guest (or any trusted
  entity) will read zeros after conversion.  Note, some CoCo implementations do
  zero memory contents such that the guest reads zeros after conversion, and
  the guest may choose to rely on that behavior.  But that's a contract between
  the trusted CoCo entity and the guest, not between KVM and the guest.

``KVM_SET_MEMORY_ATTRIBUTES2_PRESERVE``

  On conversion, KVM guarantees memory contents will be preserved with respect
  to the last written unencrypted value.  As a concrete example, if the host
  writes ``0xbeef`` to shared memory and converts the memory to private, the
  guest will also read ``0xbeef``, even if the in-memory data is encrypted as
  part of the conversion.  And vice versa, if the guest writes ``0xbeef`` to
  private memory and then converts the memory to shared, the host (and guest)
  will read ``0xbeef`` (if the memory is accessible).

Re: [RFC PATCH v2 09/37] KVM: guest_memfd: Add support for KVM_SET_MEMORY_ATTRIBUTES2

Reply via email to