On Wed, 19 Feb 2025 18:58:58 +0100
Eric Auger <eric.au...@redhat.com> wrote:

> Since kernel commit:
> 2b2c651baf1c ("vfio/pci: Invalidate mmaps and block the access
> in D3hot power state")
> any attempt to do an mmap access to a BAR when the device is in d3hot
> state will generate a fault.
> 
> On system_powerdown, if the VFIO device is translated by an IOMMU,
> the device is moved to D3hot state and then the vIOMMU gets disabled
> by the guest. As a result of this later operation, the address space is
> swapped from translated to untranslated. When re-enabling the aliased
> regions, the RAM regions are dma-mapped again and this causes DMA_MAP
> faults when attempting the operation on BARs.
> 
> To avoid doing the remap on those BARs, we compute whether the
> device is in D3hot state and if so, skip the DMA MAP.

Thinking on this some more, QEMU PCI code already manages the device
BARs appearing in the address space based on the memory enable bit in
the command register.  Should we do the same for PM state?

IOW, the device going into low power state should remove the BARs from
the AddressSpace and waking the device should re-add them.  The BAR DMA
mapping should then always be consistent, whereas here nothing would
remap the BARs when the device is woken.

I imagine we'd need an interface to register the PM capability with the
core QEMU PCI code, where address space updates are performed relative
to both memory enable and power status.  There might be a way to
implement this just for vfio-pci devices by toggling the enable state
of the BAR mmaps relative to PM state, but doing it at the PCI core
level seems like it'd provide behavior more true to physical hardware.
Thanks,

Alex


Reply via email to