On 2017/5/7 2:07, Alexander Duyck wrote: > On Fri, May 5, 2017 at 8:08 PM, Ding Tianhong <dingtianh...@huawei.com> wrote: >> >> >> On 2017/5/5 22:04, Alexander Duyck wrote: >>> On Thu, May 4, 2017 at 2:01 PM, Casey Leedom <lee...@chelsio.com> wrote: >>>> | From: Alexander Duyck <alexander.du...@gmail.com> >>>> | Sent: Wednesday, May 3, 2017 9:02 AM >>>> | ... >>>> | It sounds like we are more or less in agreement. My only concern is >>>> | really what we default this to. On x86 I would say we could probably >>>> | default this to disabled for existing platforms since my understanding >>>> | is that relaxed ordering doesn't provide much benefit on what is out >>>> | there right now when performing DMA through the root complex. As far >>>> | as peer-to-peer I would say we should probably look at enabling the >>>> | ability to have Relaxed Ordering enabled for some channels but not >>>> | others. In those cases the hardware needs to be smart enough to allow >>>> | for you to indicate you want it disabled by default for most of your >>>> | DMA channels, and then enabled for the select channels that are >>>> | handling the peer-to-peer traffic. >>>> >>>> Yes, I think that we are mostly in agreement. I had just wanted to make >>>> sure that whatever scheme was developed would allow for simultaneously >>>> supporting non-Relaxed Ordering for some PCIe End Points and Relaxed >>>> Ordering for others within the same system. I.e. not simply >>>> enabling/disabling/etc. based solely on System Platform Architecture. >>>> >>>> By the way, I've started our QA folks off looking at what things look >>>> like >>>> in Linux Virtual Machines under different Hypervisors to see what >>>> information they may provide to the VM in the way of what Root Complex Port >>>> is being used, etc. So far they've got Windows HyperV done and there >>>> there's no PCIe Fabric exposed in any way: just the attached device. I'll >>>> have to see what pci_find_pcie_root_port() returns in that environment. >>>> Maybe NULL? >>> >>> I believe NULL is one of the options. It all depends on what qemu is >>> emulating. Most likely you won't find a pcie root port on KVM because >>> the default is to emulate an older system that only supports PCI. >>> >>>> With your reservations (which I also share), I think that it probably >>>> makes sense to have a per-architecture definition of the "Can I Use Relaxed >>>> Ordering With TLPs Directed At This End Point" predicate, with the default >>>> being "No" for any architecture which doesn't implement the predicate. And >>>> if the specified (struct pci_dev *) End Node is NULL, it ought to return >>>> False for that as well. I can't see any reason to pass in the Source End >>>> Node but I may be missing something. >>>> >>>> At this point, this is pretty far outside my level of expertise. I'm >>>> happy to give it a go, but I'd be even happier if someone with a lot more >>>> experience in the PCIe Infrastructure were to want to carry the ball >>>> forward. I'm not super familiar with the Linux Kernel "Rules Of >>>> Engagement", so let me know what my next step should be. Thanks. >>>> >>>> Casey >>> >>> For now we can probably keep this on the linux-pci mailing list. Going >>> that route is the most straight forward for now since step one is >>> probably just making sure we are setting the relaxed ordering bit in >>> the setups that make sense. I would say we could probably keep it >>> simple. We just need to enable relaxed ordering by default for SPARC >>> architectures, on most others we can probably default it to off. >>> >> >> Casey, Alexander: >> >> Thanks for the wonderful discussion, it is more clearly that what to do next, >> I agree that enable relaxed ordering by default only for SPARC and ARM64 >> is more safe for all the other platform, as no one want to break anything. >> >>> I believe this all had started as Ding Tianhong was hoping to enable >>> this for the ARM architecture. That is the only one I can think of >>> where it might be difficult to figure out which way to default as we >>> were attempting to follow the same code that was enabled for SPARC and >>> that is what started this tug-of-war about how this should be done. >>> What we might do is take care of this in two phases. The first one >>> enables the infrastructure generically but leaves it defaulted to off >>> for everyone but SPARC. Then we can go through and start enabling it >>> for other platforms such as some of those on ARM in the platforms that >>> Ding Tianhong was working with. >>> >> >> According the suggestion, I could only think of this code: >> >> @@ -3979,6 +3979,15 @@ static void quirk_tw686x_class(struct pci_dev *pdev) >> DECLARE_PCI_FIXUP_CLASS_EARLY(0x1797, 0x6869, PCI_CLASS_NOT_DEFINED, 8, >> quirk_tw686x_class); >> >> +static void quirk_relaxedordering_disable(struct pci_dev *dev) >> +{ >> + if (dev->vendor != PCI_VENDOR_ID_HUAWEI && >> + dev->vendor != PCI_VENDOR_ID_SUN) >> + dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING; >> +} >> +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_INTEL_ID, PCI_ANY_ID, >> PCI_CLASS_NOT_DEFINED, 8, >> + quirk_relaxedordering_disable); >> + >> /* >> * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same >> * values for the Attribute as were supplied in the header of the >> >> >> What do you think of it? >> >> Thanks >> Ding >> > > This is a bit simplistic but it is a start. > > The other bit I was getting at is that we need to update the core PCIe > code so that when we configure devices and the root complex reports no > support for relaxed ordering it should be clearing the relaxed > ordering bits in the PCIe configuration registers on the upstream > facing devices.
How about this: rename the PCI_DEV_FLAGS_NO_RELAXED_ORDERIN to PCI_DEV_FLAGS_RELAXED_ORDERIN, only enable it when pcie root configure if it support the RO mode, otherwise we will not set it to indicate that the pcie dev did not support RO mode. > > The last bit we need in all this is a way to allow for setups where > peer-to-peer wants to perform relaxed ordering but for writes to the > host we have to not use relaxed ordering. For that we need to enable a > special case and that isn't handled right now in any of the solutions > we have coded up so far. > Sorry I am not clear of this way, can you explain more about this or give me a special case, thanks a lot. Ding > - Alex > > . >