Hi Alex, On 2/20/25 11:31 AM, Eric Auger wrote: > > Hi Alex, > > On 2/19/25 10:19 PM, Alex Williamson wrote: >> On Wed, 19 Feb 2025 11:58:44 -0700 >> Alex Williamson <alex.william...@redhat.com> wrote: >> >>> On Wed, 19 Feb 2025 18:58:58 +0100 >>> Eric Auger <eric.au...@redhat.com> wrote: >>> >>>> Since kernel commit: >>>> 2b2c651baf1c ("vfio/pci: Invalidate mmaps and block the access >>>> in D3hot power state") >>>> any attempt to do an mmap access to a BAR when the device is in d3hot >>>> state will generate a fault. >>>> >>>> On system_powerdown, if the VFIO device is translated by an IOMMU, >>>> the device is moved to D3hot state and then the vIOMMU gets disabled >>>> by the guest. As a result of this later operation, the address space is >>>> swapped from translated to untranslated. When re-enabling the aliased >>>> regions, the RAM regions are dma-mapped again and this causes DMA_MAP >>>> faults when attempting the operation on BARs. >>>> >>>> To avoid doing the remap on those BARs, we compute whether the >>>> device is in D3hot state and if so, skip the DMA MAP. >>> Thinking on this some more, QEMU PCI code already manages the device >>> BARs appearing in the address space based on the memory enable bit in >>> the command register. Should we do the same for PM state? >>> >>> IOW, the device going into low power state should remove the BARs from >>> the AddressSpace and waking the device should re-add them. The BAR DMA >>> mapping should then always be consistent, whereas here nothing would >>> remap the BARs when the device is woken. >>> >>> I imagine we'd need an interface to register the PM capability with the >>> core QEMU PCI code, where address space updates are performed relative >>> to both memory enable and power status. There might be a way to >>> implement this just for vfio-pci devices by toggling the enable state >>> of the BAR mmaps relative to PM state, but doing it at the PCI core >>> level seems like it'd provide behavior more true to physical hardware. >> I took a stab at this approach here, it doesn't obviously break >> anything in my configs, but I haven't yet tried to reproduce this exact >> scenario. >> >> https://gitlab.com/alex.williamson/qemu/-/tree/pci-pm-power-state
it does not totally fix the issue: I now get: qemu-system-x86_64: warning: vfio_container_dma_map(0x55cc25705680, 0x380000000000, 0x1000000, 0x7f8762000000) = -14 (Bad address) 0000:41:00.0: PCI peer-to-peer transactions on BARs are not supported. Eric > > So if I understand correctly the BAR regions will disappear upon the > config cmd write in vfio_sub_page_bar_update_mapping(). Is that correct? > I will give it a try on my setup... >> >> There's another pm_cap on the PCIExpressDevice that needs to be >> consolidated as well, once I do some research to figure out why a >> non-express capability is tracked only by express devices and what >> they're doing with it. Thanks, > I am not sure I get this last point though. > > Thanks > > Eric >> >> Alex >> >