On Tue, Jun 10, 2025 at 02:20:03PM +1000, Alexey Kardashevskiy wrote:
> 
> 
> On 31/5/25 02:23, Xu Yilun wrote:
> > On Fri, May 30, 2025 at 12:29:30PM +1000, Alexey Kardashevskiy wrote:
> > > 
> > > 
> > > On 30/5/25 00:41, Xu Yilun wrote:
> > > > > > > > 
> > > > > > > > FLR to a bound device is absolutely fine, just break the CC 
> > > > > > > > state.
> > > > > > > > Sometimes it is exactly what host need to stop CC immediately.
> > > > > > > > The problem is in VFIO's pre-FLR handling so we need to patch 
> > > > > > > > VFIO, not
> > > > > > > > PCI core.
> > > > > > > 
> > > > > > > What is a problem here exactly?
> > > > > > > FLR by the host which equals to any other PCI error? The guest 
> > > > > > > may or may not be able to handle it, afaik it does not handle any 
> > > > > > > errors now, QEMU just stops the guest.
> > > > > > 
> > > > > > It is about TDX Connect.
> > > > > > 
> > > > > > According to the dmabuf patchset, the dmabuf needs to be revoked 
> > > > > > before
> > > > > > FLR. That means KVM unmaps MMIOs when the device is in LOCKED/RUN 
> > > > > > state.
> > > > > > That is forbidden by TDX Module and will crash KVM.
> > > > > 
> > > > > 
> > > > > FLR is something you tell the device to do, how/why would TDX know 
> > > > > about it?
> > > > 
> > > > I'm talking about FLR in VFIO driver. The VFIO driver would zap bar
> > > > before FLR. The zapping would trigger KVM unmap MMIOs. See
> > > > vfio_pci_zap_bars() for legacy case, and see [1] for dmabuf case.
> > > 
> > > oh I did not know that we do this zapping, thanks for the pointer.
> > > > [1] 
> > > > https://lore.kernel.org/kvm/20250307052248.405803-4-vivek.kasire...@intel.com/
> > > > 
> > > > A pure FLR without zapping bar is absolutely OK.
> > > > 
> > > > > Or it check the TDI state on every map/unmap (unlikely)?
> > > > 
> > > > Yeah, TDX Module would check TDI state on every unmapping.
> > > 
> > > _every_? Reading the state from DOE mailbox is not cheap enough (imho) to 
> > > do on every unmap.
> > 
> > Sorry for confusing. TDX firmware just checks if STOP TDI firmware call
> > is executed, will not check the real device state via DOE. That means
> > even if device has physically exited to UNLOCKED, TDX host should still
> > call STOP TDI fwcall first, then MMIO unmap.
> > 
> > > 
> > > > > 
> > > > > > So the safer way is
> > > > > > to unbind the TDI first, then revoke MMIOs, then do FLR.
> > > > > > 
> > > > > > I'm not sure when p2p dma is involved AMD will have the same issue.
> > > > > 
> > > > > On AMD, the host can "revoke" at any time, at worst it'll see RMP 
> > > > > events from IOMMU. Thanks,
> > > > 
> > > > Is the RMP event firstly detected by host or guest? If by host,
> > > 
> > > Host.
> > > 
> > > > host could fool guest by just suppress the event. Guest thought the
> > > > DMA writting is successful but it is not and may cause security issue.
> > > 
> > > An RMP event on the host is an indication that RMP check has failed and 
> > > DMA to the guest did not complete so the guest won't see new data. Same 
> > > as other PCI errors really. RMP acts like a firewall, things behind it do 
> > > not need to know if something was dropped. Thanks,
> > 
> > Not really, guest thought the data is changed but it actually doesn't.
> > I.e. data integrity is broken.
> 
> I am not following, sorry. Integrity is broken when something untrusted (== 
> other than the SNP guest and the trusted device) manages to write to the 
> guest encrypted memory successfully.

Integrity is also broken when guest thought the content in some addr was
written to A but it actually stays B.

> If nothing is written - the guest can easily see this and do... nothing?

The guest may not see this only by RMP event, or IOMMU fault, malicious
host could surpress these events.  Yes, guest may later read the addr
and see the trick, but this cannot be ensured. There is no general
contract saying SW must read the addr to ensure DMA write successful.

And DMA to MMIO is the worse case than DMA to memory. SW even cannot
read back the content since MMIO registers may be Write Only.

So you need ASID fence to make guest easily see the DMA Silent Drop.
Intel & ARM also have there own way.

The purpose here is to have a consensus that benigh VMM should avoid
triggering these DMA Silent Drop protections, by "unbind TDI first,
then invalidate MMIO".

Thanks,
Yilun

> Devices have bugs or spurious interrupts happen, the guest driver should be 
> able to cope with that.
> > Also please help check if the following relates to this issue:
> > 
> > SEV-TIO Firmware Interface SPEC, Section 2.11
> > 
> > If a bound TDI sends a request to the root complex, and the IOMMU detects a 
> > fault caused by host
> > configuration, the root complex fences the ASID from all further I/O to or 
> > from that guest. A host
> > fault is either a host page table fault or an RMP check violation. ASID 
> > fencing means that the
> > IOMMU blocks all further I/O from the root complex to the guest that the 
> > TDI was bound, and the
> > root complex blocks all MMIO accesses by the guest. When a guest writes to 
> > MMIO, the write is
> > silently dropped. When a guest reads from MMIO, the guest reads 1s.
> 
> Right, this is about not letting bad data through, i.e. integrity. Thanks,
> 
> > 
> > Thanks,
> > Yilun
> > 
> > > 
> > > > 
> > > > Thanks,
> > > > Yilun
> > > 
> > > -- 
> > > Alexey
> > > 
> 
> -- 
> Alexey
> 

Reply via email to