Hi Alex,

On 2/20/25 11:31 AM, Eric Auger wrote:
> 
> Hi Alex,
> 
> On 2/19/25 10:19 PM, Alex Williamson wrote:
>> On Wed, 19 Feb 2025 11:58:44 -0700
>> Alex Williamson <alex.william...@redhat.com> wrote:
>>
>>> On Wed, 19 Feb 2025 18:58:58 +0100
>>> Eric Auger <eric.au...@redhat.com> wrote:
>>>
>>>> Since kernel commit:
>>>> 2b2c651baf1c ("vfio/pci: Invalidate mmaps and block the access
>>>> in D3hot power state")
>>>> any attempt to do an mmap access to a BAR when the device is in d3hot
>>>> state will generate a fault.
>>>>
>>>> On system_powerdown, if the VFIO device is translated by an IOMMU,
>>>> the device is moved to D3hot state and then the vIOMMU gets disabled
>>>> by the guest. As a result of this later operation, the address space is
>>>> swapped from translated to untranslated. When re-enabling the aliased
>>>> regions, the RAM regions are dma-mapped again and this causes DMA_MAP
>>>> faults when attempting the operation on BARs.
>>>>
>>>> To avoid doing the remap on those BARs, we compute whether the
>>>> device is in D3hot state and if so, skip the DMA MAP.  
>>> Thinking on this some more, QEMU PCI code already manages the device
>>> BARs appearing in the address space based on the memory enable bit in
>>> the command register.  Should we do the same for PM state?
>>>
>>> IOW, the device going into low power state should remove the BARs from
>>> the AddressSpace and waking the device should re-add them.  The BAR DMA
>>> mapping should then always be consistent, whereas here nothing would
>>> remap the BARs when the device is woken.
>>>
>>> I imagine we'd need an interface to register the PM capability with the
>>> core QEMU PCI code, where address space updates are performed relative
>>> to both memory enable and power status.  There might be a way to
>>> implement this just for vfio-pci devices by toggling the enable state
>>> of the BAR mmaps relative to PM state, but doing it at the PCI core
>>> level seems like it'd provide behavior more true to physical hardware.
>> I took a stab at this approach here, it doesn't obviously break
>> anything in my configs, but I haven't yet tried to reproduce this exact
>> scenario.
>>
>> https://gitlab.com/alex.williamson/qemu/-/tree/pci-pm-power-state

it does not totally fix the issue: I now get:

qemu-system-x86_64: warning: vfio_container_dma_map(0x55cc25705680,
0x380000000000, 0x1000000, 0x7f8762000000) = -14 (Bad address)
0000:41:00.0: PCI peer-to-peer transactions on BARs are not supported.


Eric

> 
> So if I understand correctly the BAR regions will disappear upon the
> config cmd write in vfio_sub_page_bar_update_mapping(). Is that correct?
> I will give it a try on my setup...
>>
>> There's another pm_cap on the PCIExpressDevice that needs to be
>> consolidated as well, once I do some research to figure out why a
>> non-express capability is tracked only by express devices and what
>> they're doing with it.  Thanks,
> I am not sure I get this last point though.
> 
> Thanks
> 
> Eric
>>
>> Alex
>>
> 


Reply via email to