Hi Alex,
On 2/19/25 10:19 PM, Alex Williamson wrote: > On Wed, 19 Feb 2025 11:58:44 -0700 > Alex Williamson <alex.william...@redhat.com> wrote: > >> On Wed, 19 Feb 2025 18:58:58 +0100 >> Eric Auger <eric.au...@redhat.com> wrote: >> >>> Since kernel commit: >>> 2b2c651baf1c ("vfio/pci: Invalidate mmaps and block the access >>> in D3hot power state") >>> any attempt to do an mmap access to a BAR when the device is in d3hot >>> state will generate a fault. >>> >>> On system_powerdown, if the VFIO device is translated by an IOMMU, >>> the device is moved to D3hot state and then the vIOMMU gets disabled >>> by the guest. As a result of this later operation, the address space is >>> swapped from translated to untranslated. When re-enabling the aliased >>> regions, the RAM regions are dma-mapped again and this causes DMA_MAP >>> faults when attempting the operation on BARs. >>> >>> To avoid doing the remap on those BARs, we compute whether the >>> device is in D3hot state and if so, skip the DMA MAP. >> Thinking on this some more, QEMU PCI code already manages the device >> BARs appearing in the address space based on the memory enable bit in >> the command register. Should we do the same for PM state? >> >> IOW, the device going into low power state should remove the BARs from >> the AddressSpace and waking the device should re-add them. The BAR DMA >> mapping should then always be consistent, whereas here nothing would >> remap the BARs when the device is woken. >> >> I imagine we'd need an interface to register the PM capability with the >> core QEMU PCI code, where address space updates are performed relative >> to both memory enable and power status. There might be a way to >> implement this just for vfio-pci devices by toggling the enable state >> of the BAR mmaps relative to PM state, but doing it at the PCI core >> level seems like it'd provide behavior more true to physical hardware. > I took a stab at this approach here, it doesn't obviously break > anything in my configs, but I haven't yet tried to reproduce this exact > scenario. > > https://gitlab.com/alex.williamson/qemu/-/tree/pci-pm-power-state So if I understand correctly the BAR regions will disappear upon the config cmd write in vfio_sub_page_bar_update_mapping(). Is that correct? I will give it a try on my setup... > > There's another pm_cap on the PCIExpressDevice that needs to be > consolidated as well, once I do some research to figure out why a > non-express capability is tracked only by express devices and what > they're doing with it. Thanks, I am not sure I get this last point though. Thanks Eric > > Alex >