On Fri, 2022-12-16 at 08:32 -0500, Stefan Berger wrote:
> On 12/16/22 07:54, Daniel P. Berrangé wrote:
> > On Fri, Dec 16, 2022 at 07:28:59AM -0500, Stefan Berger wrote:
[...]
> > > Nevertheless it needs documentation and has to handle migration
> > > scenarios either via a blocker or it has to handle them all
> > > correctly. Since it's supposed to be a TPM running remote you
> > > had asked for TLS support iirc.
> > 
> > If the mssim implmentation doesn't provide TLS itself, then I don't
> > consider that a blocker on the QEMU side, merely a nice-to-have.
> > 
> > With swtpm the control channel is being used to load and store
> > state during the migration dance. This makes the use of an external
> > process largely transparent to the user, since QEMU handles all the
> > state save/load as part of its migration data stream.
> > 
> > With mssim there is state save/load co-ordination with QEMU.
> > Instead whomever/whatever is managing the mssim instance, is
> > responsible for ensuring it is running with the correct state at
> > the time QEMU does a vmstate load. If doing a live migration this
> > co-ordination is trivial if you just use the same mssim instance
> > for both src/dst to connect to.
> > 
> > If doing save/store to disk, the user needs to be able to save the
> > mssim state and load it again later. If doing snapshots and
> > reverting to old
> 
> There is no way for storing and loading the *volatile state* of the
> mssim device.

Well, yes there is, it saves internal TPM state to an NVChip file:

https://github.com/microsoft/ms-tpm-20-ref/blob/main/TPMCmd/Platform/src/NVMem.c

However, if I were running this as a service, I'd condition saving and
restoring state on a connection protocol, which would mean QEMU
wouldn't have to worry about it.  The simplest approach, of course, is
just to keep the service running even when the VM is suspended so the
state is kept internally.

> > snapshots, then again whomever manages mssim needs to be keeping
> > saved TPM state corresponding to each QEMU snapshot saved, and
> > picking the right one when restoring to old snapshots.
> 
> This doesn't work.

I already told you I tested this and it does work.  I'll actually add
the migration state check to the power on/off path because I need that
for testing S3 anyway.

> Either way, if it's possible it can be documented and shown how this
> works.

I could do a blog post, but I really don't think you want this in
official documentation because that creates support expectations.
> 
> > QEMU exposes enough functionality to enable a mgmt app / admin us>
> > achieve all of this.
> 
> How do you store the volatile state of this device, like the current
> state of the PCRs, loaded sessions etc? It doesn't support this.

That's not the only way of doing migration.  This precise problem
exists for VFIO and PCI pass through devices as well: external state is
stored in the card and that state must be matched in some way for the
card to work on resume.  Pretty much any external device coupled to the
VM has this problem.  As I keep saying you're thinking about this in
the wrong way: it's not a system directly slaved to QEMU it's an
independent daemon which must be managed separately.  The design is for
it to function like a passthrough.

> > This is not as seemlessly integrated with swtpm is, but it is still
> > technically posssible todo the right thing with migration from
> > QEMU's POV. Whether or not the app/person managing mssim instance
> > actually does the right thing in practice is not a concern of QEMU.
> > I don't see a need for a migration blocker here.
> 
> I do see it because the *volatile state* cannot be extracted from
> this device. The state of the PCRs is going to be lost.

Installing a migration blocker would prevent me from exercising the S3
paths, which I want to test.

James


Reply via email to