On Fri, Jul 10, 2020 at 4:45 PM Christoph Hellwig <h...@infradead.org> wrote: > > On Fri, Jul 10, 2020 at 03:23:25PM +1000, Oliver O'Halloran wrote: > > This is largely prep work for supporting VFs in the 32bit MMIO window. > > This is an unfortunate necessity due to how the Linux BAR allocator > > handles BARs marked as non-prefetchable. The distinction > > between prefetch and non-prefetchable BARs was made largely irrelevant > > with the introduction of PCIe, but the BAR allocator is overly > > conservative. It will always place non-pref bars in the prefetchable > > window, which is 32bit only. This results in us being unable to use VFs > > from NVMe drives and a few different RAID cards. > > How about fixing that in the core PCI code?
I've been kicking around the idea but I've never managed to convince myself that ignoring the non-prefetchable bit is a safe thing to do in generic code. Since Gen3 at least the PCIe Base spec has provided some guidance about when you can put non-prefetchable BARs in the prefetchable window and as of the Gen5 spec it lists these conditions: > 1) The entire path from the host to the adapter is over PCI Express. > 2) No conventional PCI or PCI-X devices do peer-to-peer reads to the range > mapped by the BAR. > 3) The PCI Express Host Bridge does no byte merging. (This is believed to be > true on most platforms.) > 4) Any locations with read side-effects are never the target of Memory Reads > with the TH bit Set. > 5) The range mapped by the BAR is never the target of a speculative Memory > Read, either Host initiated or peer-to-peer. 1) Is easy enough to verify. 2) Is probably true, but who knows. 3) I know this is true for the platforms I'm looking at since the HW designers assure me there is no merging happening at the host-bridge level. Merging of MMIO ops does seem like an insane thing to do so it's probably true in general too, but there's no real way to tell. 4) Is also *probably* true since the TH bit is only set when it's explicitly enabled via the TLP Processing Hints extended capability in config space. I guess it's possible firmware might enable that without Linux realising, but in that case Linux is probably not doing BAR allocation. 5) I have no idea about, but it seems difficult to make any kind of general statement about. I might just be being paranoid. Oliver