+CC ARM maintainers

On Wed, Nov 1, 2023 at 12:40 AM Joe L <jlo...@gmail.com> wrote:
>
> Hello,
>
> During experimentation on an AARCH64 platform with a PCIe endpoint that 
> supports option ROM, it was found that the ordering of transactions between 
> ECAM (Cfg-Write) and MMIO (Mem-Read) is not preserved by the CHI HN-I node. 
> The observed sequence is as follows:
> 1. EFI issues a Cfg-Write to endpoint Memory Space Enable bit to enable 
> memory access
> 2. EFI issues a Mem-Read to the endpoint Option ROM memory space and receives 
> back 0xFFFF (unsupported request)
> If the memory ordering between the two accesses is explicitly preserved via 
> memory barrier (DMB instruction), 2. will return back valid data instead of 
> UR. Analyzing the transactions with a protocol analyzer, we found that the 
> Mem-Read was being issued before the completion for the Cfg-Write is received.
>
> On this system, the HN-I node is configured to separate the ECAM and MMIO 
> into different SAM regions. Both regions are assigned Decice-nGnRnE 
> attributes. According to this snippet from Arm "DEN0114 PCIe AMBA Integration 
> Guide", the ordering of even Device-nGnRnE memory regions cannot be 
> guaranteed if they belong to separate PCIe address spaces
>
> 4.8.2
>
> Multiple PCIe address spaces mapped as Device-nGnRnE or Device-nGnRE Arm 
> memory model does not give any ordering guarantees between accesses to 
> different Device-nGnRnE or Device-nGnRE peripherals. Additionally, there is 
> no restriction on mapping various PCIe address spaces of the same PCIe 
> function as different Device-nGnRnE or Device-nGnRE peripherals. 
> Consequently, software cannot assume that program order will be maintained 
> between accesses to two different PCIe address spaces, even though both 
> spaces are mapped as Device-nGnRnE or Device-nGnRE. Therefore, for maximum 
> software portability, ordering requirements between accesses to different 
> PCIe address spaces must be handled explicitly in software using appropriate 
> ordering instructions."
>
> We requested a comment from an Arm representative and received a similar 
> response, confirming that a memory barrier is needed to ensure ordering 
> between accesses to ECAM and MMIO regions (or between any two ranges that are 
> assigned to a separate SAM address region)
>
> When they are to two different order regions, the read will not wait for the 
> write to complete, and can return data before the write does anything. The 
> HN-I only preserves ordering between reads and writes to the same Order 
> Region (which implies the same Address Region). Likewise, the HN-I will only 
> preserve ordering between multiple reads and between multiple writes within 
> the same Order Region, and it accomplishes this by issuing the reads with the 
> same ARID and the writes with the same AWID (i.e. it relies on the downstream 
> device to follow AXI ordering rules). Issuing a CHI request with 
> REQ.Order=EndpointOrder only guarantees ordering to the same “endpoint 
> address range,” which the HN-I defines as an Order Region (within an Address 
> Region).
>
> Our CMN TRM showcases an example where ECAM and MMIO are two different 
> regions in the HN-I SAM. The implication is that we would expect a DSB 
> between the ECAM write and MMIO read. I'm asking our Open Source Software 
> group to confirm that standard PCIe software is generally expected to be 
> aware of the need for a DSB--but my impression from talking to some of our 
> hardware engineers is that that is indeed the expectation.
>
>
> I am requesting that EDK2 consumes or produces a change to the current 
> PciExpressLib that will ensure ordering on Arm architectures after Cfg-Writes 
> which may or may not have side effects. For example, in 
> MdePkg/Library/BasePciExpressLib/PciExpressLib.c,
>
> UINT8
> EFIAPI
> PciExpressWrite8 (
>   IN      UINTN  Address,
>   IN      UINT8  Value
>   )
> {
>   ASSERT_INVALID_PCI_ADDRESS (Address);
>   if (Address >= PcdPciExpressBaseSize ()) {
>     return (UINT8)-1;
>   }
>
>   return MmioWrite8 ((UINTN)GetPciExpressBaseAddress () + Address, Value);
> }
>
> should become
>
>
>
> UINT8
> EFIAPI
> PciExpressWrite8 (
>   IN      UINTN  Address,
>   IN      UINT8  Value
>   )
> {
>   ASSERT_INVALID_PCI_ADDRESS (Address);
>   if (Address >= PcdPciExpressBaseSize ()) {
>     return (UINT8)-1;
>   }
>
>   UINT8 ReturnValue = MmioWrite8 ((UINTN)GetPciExpressBaseAddress () + 
> Address, Value);
>   #if defined (MDE_CPU_AARCH64)
>      MemoryFence (); // DMB sy or DSB
>   #endif
>
>   return ReturnValue;
> }
>
> Please let me know your thoughts and if this is the correct implementation 
> change needed to enforce memory ordering between separate address regions. I 
> would also like to know the preferred memory barrier instruction in this case 
> (DMB or DSB).

Hrmmm, I think the current implementation of MmioRead/Write only
offers TSO instead of "full serialization". For instance:

//
//  Reads an 8-bit MMIO register.
//
//  Reads the 8-bit MMIO register specified by Address. The 8-bit read value is
//  returned. This function must guarantee that all MMIO read and write
//  operations are serialized.
//
//  @param  Address The MMIO register to read.
//
//  @return The value read.
//
ASM_PFX(MmioRead8Internal):
  AARCH64_BTI(c)
  ldrb    w0, [x0]
  dmb     ld
  ret

//
//  Writes an 8-bit MMIO register.
//
//  Writes the 8-bit MMIO register specified by Address with the value specified
//  by Value and returns Value. This function must guarantee that all MMIO read
//  and write operations are serialized.
//
//  @param  Address The MMIO register to write.
//  @param  Value   The value to write to the MMIO register.
//
ASM_PFX(MmioWrite8Internal):
  AARCH64_BTI(c)
  dmb     st
  strb    w1, [x0]
  ret


So prior reads are ordered against reads/stores (through dmb ld after
the load). Prior writes are ordered against writes only (through dmb
st before the store). So one can easily devise a problem here where

  dmb     st /* orders against prior writes */
  strb    w1, [x0] /* write */
[...]

  ldrb    w0, [x0] /* load */
  dmb     ld /* orders this load against later reads/stores, but not
against prior writes (!!) */

where the CPU can (AFAIK) freely reorder the write after the load in
the ARM64 case.

Do we want dmb st after the store instead?

-- 
Pedro


-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#110462): https://edk2.groups.io/g/devel/message/110462
Mute This Topic: https://groups.io/mt/102310377/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Reply via email to