Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory

Andy Lutomirski Sun, 24 Apr 2022 10:01:35 -0700

On Fri, Apr 22, 2022, at 3:56 AM, Chao Peng wrote:
> On Tue, Apr 05, 2022 at 06:03:21PM +0000, Sean Christopherson wrote:
>> On Tue, Apr 05, 2022, Quentin Perret wrote:
>> > On Monday 04 Apr 2022 at 15:04:17 (-0700), Andy Lutomirski wrote:
>     Only when the register succeeds, the fd is
>     converted into a private fd, before that, the fd is just a normal (shared)
>     one. During this conversion, the previous data is preserved so you can put
>     some initial data in guest pages (whether the architecture allows this is
>     architecture-specific and out of the scope of this patch).

I think this can be made to work, but it will be awkward.  On TDX, for example, 
what exactly are the semantics supposed to be?  An error code if the memory 
isn't all zero?  An error code if it has ever been written?

Fundamentally, I think this is because your proposed lifecycle for these 
memfiles results in a lightweight API but is awkward for the intended use 
cases.  You're proposing, roughly:

1. Create a memfile. 

Now it's in a shared state with an unknown virt technology.  It can be read and 
written.  Let's call this state BRAND_NEW.

2. Bind to a VM.

Now it's an a bound state.  For TDX, for example, let's call the new state 
BOUND_TDX.  In this state, the TDX rules are followed (private memory can't be 
converted, etc).

The problem here is that the BOUND_NEW state allows things that are nonsensical 
in TDX, and the binding step needs to invent some kind of semantics for what 
happens when binding a nonempty memfile.


So I would propose a somewhat different order:

1. Create a memfile.  It's in the UNBOUND state and no operations whatsoever 
are allowed except binding or closing.

2. Bind the memfile to a VM (or at least to a VM technology).  Now it's in the 
initial state appropriate for that VM.

For TDX, this completely bypasses the cases where the data is prepopulated and 
TDX can't handle it cleanly.  For SEV, it bypasses a situation in which data 
might be written to the memory before we find out whether that data will be 
unreclaimable or unmovable.


----------------------------------------------

Now I have a question, since I don't think anyone has really answered it: how 
does this all work with SEV- or pKVM-like technologies in which private and 
shared pages share the same address space?  I sounds like you're proposing to 
have a big memfile that contains private and shared pages and to use that same 
memfile as pages are converted back and forth.  IO and even real physical DMA 
could be done on that memfile.  Am I understanding correctly?

If so, I think this makes sense, but I'm wondering if the actual memslot setup 
should be different.  For TDX, private memory lives in a logically separate 
memslot space.  For SEV and pKVM, it doesn't.  I assume the API can reflect 
this straightforwardly.

And the corresponding TDX question: is the intent still that shared pages 
aren't allowed at all in a TDX memfile?  If so, that would be the most direct 
mapping to what the hardware actually does.

--Andy
Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory

Reply via email to