On 04.10.2022 14:59, Roger Pau Monné wrote:
> On Tue, Oct 04, 2022 at 02:21:20PM +0200, Jan Beulich wrote:
>> On 04.10.2022 14:17, Roger Pau Monné wrote:
>>> On Tue, Oct 04, 2022 at 12:40:10PM +0200, Jan Beulich wrote:
>>>> On 04.10.2022 11:27, Roger Pau Monné wrote:
>>>>> On Tue, Oct 04, 2022 at 11:01:18AM +0200, Jan Beulich wrote:
>>>>>> On 30.09.2022 16:17, Roger Pau Monne wrote:
>>>>>>> The EFI memory map contains two memory types (EfiMemoryMappedIO and
>>>>>>> EfiMemoryMappedIOPortSpace) used to describe IO memory areas of
>>>>>>> devices used by EFI.
>>>>>>>
>>>>>>> The current parsing of the EFI memory map was translating
>>>>>>> EfiMemoryMappedIO and EfiMemoryMappedIOPortSpace to E820_RESERVED on
>>>>>>> x86.  This is an issue because device MMIO regions (BARs) should not
>>>>>>> be positioned on reserved regions.  Any BARs positioned on non-hole
>>>>>>> areas of the memory map will cause is_memory_hole() to return false,
>>>>>>> which would then cause memory decoding to be disabled for such device.
>>>>>>> This leads to EFI firmware malfunctions when using runtime services.
>>>>>>>
>>>>>>> The system under which this was observed has:
>>>>>>>
>>>>>>> EFI memory map:
>>>>>>> [...]
>>>>>>>  00000fd000000-00000fe7fffff type=11 attr=800000000000100d
>>>>>>> [...]
>>>>>>> 0000:00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map
>>>>>>>
>>>>>>> The device behind this BAR is:
>>>>>>>
>>>>>>> 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI 
>>>>>>> Controller (rev 09)
>>>>>>>         Subsystem: Super Micro Computer Inc Device 091c
>>>>>>>         Flags: fast devsel
>>>>>>>         Memory at fe010000 (32-bit, non-prefetchable) [size=4K]well
>>>>>>>
>>>>>>> For the record, the symptom observed in that machine was a hard freeze
>>>>>>> when attempting to set an EFI variable (XEN_EFI_set_variable).
>>>>>>>
>>>>>>> Fix by not adding regions with type EfiMemoryMappedIO or
>>>>>>> EfiMemoryMappedIOPortSpace to the e820 memory map, that allows BARs to
>>>>>>> be positioned there.
>>>>>>>
>>>>>>> Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably 
>>>>>>> positioned')
>>>>>>> Signed-off-by: Roger Pau Monné <roger....@citrix.com>
>>>>>>
>>>>>> In the best case this is moving us from one way of being wrong to 
>>>>>> another:
>>>>>> So far we wrongly include BARs in E820_RESERVED (_if_ they can be
>>>>>> legitimately covered by a EfiMemoryMappedIO region in the first place,
>>>>>> which I'm not sure is actually permitted - iirc just like E820_RESERVED
>>>>>> may not be used for BARs, this memory type also may not be), whereas with
>>>>>> your change we would no longer report non-BAR MMIO space (chipset 
>>>>>> specific
>>>>>> ranges for example) as reserved. In fact I think the example you provide
>>>>>> is at least partly due to bogus firmware behavior: The BAR is put in 
>>>>>> space
>>>>>> normally used for firmware specific memory (MMIO) ranges. I think 
>>>>>> firmware
>>>>>> should either assign the BAR differently or exclude the range from the
>>>>>> memory map.
>>>>>
>>>>> Hm, I'm not sure the example is bogus, how would firmware request a BAR
>>>>> to be mapped for run time services to access it otherwise if it's not
>>>>> using EfiMemoryMappedIO?
>>>>>
>>>>> Not adding the BAR to the memory map in any way would mean the OS is
>>>>> free to not map it for runtime services to access.
>>>>
>>>> My view is that BARs should not be marked for runtime services use. Doing
>>>> so requires awareness of the driver inside the OS, which I don't think
>>>> one can expect. If firmware needs to make use of a device in a system, it
>>>> ought to properly hide it from the OS. Note how the potential sharing of
>>>> an RTC requires special provisions in the spec, mandating driver awareness.
>>>>
>>>> Having a BAR expressed in the memory map also contradicts the ability of
>>>> an OS to relocate all BARs of all devices, if necessary.
>>>
>>> I've failed to figure out if there's a way in UEFI to report a device
>>> is in use by the firmware.  I've already looked before sending the
>>> patch (see also the post commit notes about for example not passing
>>> through the device to any guest for obvious reason).
>>>
>>> I've got no idea if Linux has any checks to avoid trying to move BARs
>>> residing in EfiMemoryMappedIO ranges, we have now observed this
>>> behavior in two systems already.
>>>
>>> Maybe we could do a special check for PCI devices and allow them
>>> having BARs in EfiMemoryMappedIO, together with printing a warning
>>> message.
>>
>> Right, that's one of the possible quirk workarounds I was thinking of.
>> At the risk of stating the obvious - the same would presumably apply to
>> E820_RESERVED on non-EFI systems then.
> 
> One option would be to strictly limit to EfiMemoryMappedIO, by taking
> the EFI memory map into account also if present.
> 
> Another maybe simpler option is to allow BARs to be placed in
> E820_RESERVED regions, and translate EfiMemoryMappedIO into
> E820_RESERVED like we have been doing.
> 
> I will attempt the later if you are OK with the approach.

I might be okay with the approach, but first of all I continue to be
worried of us violating specifications if we make this the default
behavior.

Jan

Reply via email to