Hi Alex,

On 2/19/25 10:19 PM, Alex Williamson wrote:
> On Wed, 19 Feb 2025 11:58:44 -0700
> Alex Williamson <alex.william...@redhat.com> wrote:
>
>> On Wed, 19 Feb 2025 18:58:58 +0100
>> Eric Auger <eric.au...@redhat.com> wrote:
>>
>>> Since kernel commit:
>>> 2b2c651baf1c ("vfio/pci: Invalidate mmaps and block the access
>>> in D3hot power state")
>>> any attempt to do an mmap access to a BAR when the device is in d3hot
>>> state will generate a fault.
>>>
>>> On system_powerdown, if the VFIO device is translated by an IOMMU,
>>> the device is moved to D3hot state and then the vIOMMU gets disabled
>>> by the guest. As a result of this later operation, the address space is
>>> swapped from translated to untranslated. When re-enabling the aliased
>>> regions, the RAM regions are dma-mapped again and this causes DMA_MAP
>>> faults when attempting the operation on BARs.
>>>
>>> To avoid doing the remap on those BARs, we compute whether the
>>> device is in D3hot state and if so, skip the DMA MAP.  
>> Thinking on this some more, QEMU PCI code already manages the device
>> BARs appearing in the address space based on the memory enable bit in
>> the command register.  Should we do the same for PM state?
>>
>> IOW, the device going into low power state should remove the BARs from
>> the AddressSpace and waking the device should re-add them.  The BAR DMA
>> mapping should then always be consistent, whereas here nothing would
>> remap the BARs when the device is woken.
>>
>> I imagine we'd need an interface to register the PM capability with the
>> core QEMU PCI code, where address space updates are performed relative
>> to both memory enable and power status.  There might be a way to
>> implement this just for vfio-pci devices by toggling the enable state
>> of the BAR mmaps relative to PM state, but doing it at the PCI core
>> level seems like it'd provide behavior more true to physical hardware.
> I took a stab at this approach here, it doesn't obviously break
> anything in my configs, but I haven't yet tried to reproduce this exact
> scenario.
>
> https://gitlab.com/alex.williamson/qemu/-/tree/pci-pm-power-state

So if I understand correctly the BAR regions will disappear upon the
config cmd write in vfio_sub_page_bar_update_mapping(). Is that correct?
I will give it a try on my setup...
>
> There's another pm_cap on the PCIExpressDevice that needs to be
> consolidated as well, once I do some research to figure out why a
> non-express capability is tracked only by express devices and what
> they're doing with it.  Thanks,
I am not sure I get this last point though.

Thanks

Eric
>
> Alex
>


Reply via email to