PCI: Work around PCIe link training failures

2025-06-10 Thread Matthew W Carlis
Hello again.. It looks like there are specific system configurations that are extremely likely to have issues with this patch & result in undesirable system behavior.. Specifically hot-plug systems with side-band presence detection & without Power Controls (i.e PwrCtrl-) given to config space. I

Re: PCI: Work around PCIe link training failures

2024-10-03 Thread Maciej W. Rozycki
On Wed, 2 Oct 2024, Bjorn Helgaas wrote: > If there's anything missing that still needs to be added to v6.13-rc1, > can somebody repost those? I lost track of what's still outstanding. I have nothing outstanding to add right away. Thank you for asking. Maciej

Re: PCI: Work around PCIe link training failures

2024-10-02 Thread Bjorn Helgaas
On Wed, Oct 02, 2024 at 01:58:15PM +0100, Maciej W. Rozycki wrote: > On Tue, 1 Oct 2024, Matthew W Carlis wrote: > > > I just wanted to follow up with our testing results for the mentioned > > patches. It took me a while to get them running in our test pool, but > > we just got it going yesterday

Re: PCI: Work around PCIe link training failures

2024-10-02 Thread Maciej W. Rozycki
On Tue, 1 Oct 2024, Matthew W Carlis wrote: > I just wanted to follow up with our testing results for the mentioned > patches. It took me a while to get them running in our test pool, but > we just got it going yesterday and the initial results look really good. > We will continue running them in

PCI: Work around PCIe link training failures

2024-10-01 Thread Matthew W Carlis
I just wanted to follow up with our testing results for the mentioned patches. It took me a while to get them running in our test pool, but we just got it going yesterday and the initial results look really good. We will continue running them in our testing from now on & if any issues come up I'll

Re: PCI: Work around PCIe link training failures

2024-08-16 Thread Maciej W. Rozycki
On Thu, 15 Aug 2024, Matthew W Carlis wrote: > > Well, in principle in a setup with reliable links the LBMS bit may never > > be set, e.g. this system of mine has been in 24/7 operation since the last > > reboot 410 days ago and for the devices that support Link Active reporting > > it shows: >

PCI: Work around PCIe link training failures

2024-08-15 Thread Matthew W Carlis
Sorry for the delay in my responses here I had some things get in my way. On Fri, 9 Aug 2024 09:13:52 Oliver O'Halloran wrote: > Ok? If we have to check for DPC being enabled in addition to checking > the surprise bit in the slot capabilities then that's fine, we can do > that. The question to b

Re: PCI: Work around PCIe link training failures

2024-08-09 Thread Maciej W. Rozycki
On Wed, 7 Aug 2024, Matthew W Carlis wrote: > > For the quirk to trigger, the link has to be down and there has to be the > > LBMS Link Status bit set from link management events as per the PCIe spec > > while the link was previously up, and then both of that while rescanning > > the PCIe device i

Re: PCI: Work around PCIe link training failures

2024-08-08 Thread Oliver O'Halloran
On Thu, Aug 8, 2024 at 12:08 PM Matthew W Carlis wrote: > > On Wed, 7 Aug 2024 22:29:35 +1000 Oliver O'Halloran Wrote > > My read was that Matt is essentially doing a surprise hot-unplug by > > removing power to the card without notifying the OS. I thought the > > LBMS bit wouldn't be set in that

PCI: Work around PCIe link training failures

2024-08-07 Thread Matthew W Carlis
On Wed, 7 Aug 2024 22:29:35 +1000 Oliver O'Halloran Wrote > My read was that Matt is essentially doing a surprise hot-unplug by > removing power to the card without notifying the OS. I thought the > LBMS bit wouldn't be set in that case since the link goes down rather > than changes speed, but the

Re: PCI: Work around PCIe link training failures

2024-08-07 Thread Oliver O'Halloran
On Wed, Aug 7, 2024 at 9:14 PM Maciej W. Rozycki wrote: > > On Wed, 7 Aug 2024, Matthew W Carlis wrote: > > > > it does seem like this series made wASMedia ASM2824 work better but > > > caused regressions elsewhere, so maybe we just need to accept that > > > ASM2824 is slightly broken and doesn't

Re: PCI: Work around PCIe link training failures

2024-08-07 Thread Maciej W. Rozycki
On Mon, 5 Aug 2024, Matthew W Carlis wrote: > The most common place where we see our systems getting stuck at Gen1 is with > device power cycling. If a device is powered on and then off quickly then the > link will of course fail to train & the consequence here is that the port is > forced to Gen1

Re: PCI: Work around PCIe link training failures

2024-08-07 Thread Maciej W. Rozycki
On Wed, 7 Aug 2024, Matthew W Carlis wrote: > > it does seem like this series made wASMedia ASM2824 work better but > > caused regressions elsewhere, so maybe we just need to accept that > > ASM2824 is slightly broken and doesn't work as well as it should. > > One of my colleagues challenged me t

PCI: Work around PCIe link training failures

2024-08-07 Thread Matthew W Carlis
On Tues, 06 Aug 2024 Bjorn Helgaas wrote: > it does seem like this series made wASMedia ASM2824 work better but > caused regressions elsewhere, so maybe we just need to accept that > ASM2824 is slightly broken and doesn't work as well as it should. One of my colleagues challenged me to provide a m

Re: PCI: Work around PCIe link training failures

2024-08-06 Thread Bjorn Helgaas
On Mon, Aug 05, 2024 at 06:06:59PM -0600, Matthew W Carlis wrote: > Hello again. I just realized that my first response to this thread two weeks > ago was not actually starting from the end of the discussion. I hope I found > it now... Must say sorry for this I am still figuring out how to follow t

PCI: Work around PCIe link training failures

2024-08-05 Thread Matthew W Carlis
Hello again. I just realized that my first response to this thread two weeks ago was not actually starting from the end of the discussion. I hope I found it now... Must say sorry for this I am still figuring out how to follow these threads. I need to ask if we can either revert this patch or only m

PCI: Work around PCIe link training failures

2024-07-29 Thread Matthew W Carlis
On Mon, 29 July 2024, Ilpo Järvinen wrote: > The most obvious solution is to not leave the speed at Gen1 on failure in > Target Speed quirk but to restore the original Target Speed value. The > downside with that is if the current retraining interface (function) is > used, it adds delay. Tends to

Re: PCI: Work around PCIe link training failures

2024-07-29 Thread Maciej W. Rozycki
On Mon, 29 Jul 2024, Ilpo Järvinen wrote: > > > The main reason is it is believed that it is the downstream device > > > causing the issue, and obviously you can't fetch its ID if you can't > > > negotiate link so as to talk to it in the first place. > > > > Have had some more time to look into t

Re: PCI: Work around PCIe link training failures

2024-07-29 Thread Ilpo Järvinen
On Fri, 26 Jul 2024, Matthew W Carlis wrote: > On Mon, 22 Jul 2024, Maciej W. Rozycki wrote: > > > The main reason is it is believed that it is the downstream device > > causing the issue, and obviously you can't fetch its ID if you can't > > negotiate link so as to talk to it in the first place.

PCI: Work around PCIe link training failures

2024-07-26 Thread Matthew W Carlis
On Mon, 22 Jul 2024, Maciej W. Rozycki wrote: > The main reason is it is believed that it is the downstream device > causing the issue, and obviously you can't fetch its ID if you can't > negotiate link so as to talk to it in the first place. Have had some more time to look into this issue. So, I

PCI: Work around PCIe link training failures

2024-07-24 Thread Matthew W Carlis
Sorry for belated response. I wasn't really sure when you first asked & I still only have a 'hand wavy' theory here. I think one thing that is getting us in trouble is when we turn the endpoint device on, then off, wait for a little while then turn it back on. It seems that the port here in this ca

PCI: Work around PCIe link training failures

2024-07-22 Thread Matthew W Carlis
Sorry to resurrect this one, but I was wondering why the PCI device ID in drivers/pci/quirks.c for the ASMedia ASM2824 isn't checked before forcing the link down to Gen1... We have had to revert this patch during our kernel migration due to it interacting poorly with at least one older Gen3 PLX PCI

Re: PCI: Work around PCIe link training failures

2024-07-22 Thread Maciej W. Rozycki
[+cc Ilpo for his previous involvement here] On Mon, 22 Jul 2024, Matthew W Carlis wrote: > Sorry to resurrect this one, but I was wondering why the > PCI device ID in drivers/pci/quirks.c for the ASMedia ASM2824 > isn't checked before forcing the link down to Gen1... We have > had to revert this

Re: [PATCH v8 7/7] PCI: Work around PCIe link training failures

2023-06-11 Thread Maciej W. Rozycki
On Thu, 4 May 2023, Bjorn Helgaas wrote: > We talked about reusing pcie_retrain_link() earlier. IIRC that didn't > work: ASPM needs to use PCI_EXP_LNKSTA_LT because not all devices > support PCI_EXP_LNKSTA_DLLLA, and you need PCI_EXP_LNKSTA_DLLLA > because the erratum makes PCI_EXP_LNKSTA_LT flap

[PATCH v9 14/14] PCI: Work around PCIe link training failures

2023-06-11 Thread Maciej W. Rozycki
Attempt to handle cases such as with a downstream port of the ASMedia ASM2824 PCIe switch where link training never completes and the link continues switching between speeds indefinitely with the data link layer never reaching the active state. It has been observed with a downstream port of the

Re: [PATCH v8 7/7] PCI: Work around PCIe link training failures

2023-05-14 Thread Maciej W. Rozycki
On Sun, 7 May 2023, Maciej W. Rozycki wrote: > > We're going to land this series this cycle, come hell or high water. > > Thank you for coming back to me and for your promise. I'll strive to > address your concerns next weekend. > > Unfortunately a PDU in my remote lab has botched up and I'v

Re: [PATCH v8 7/7] PCI: Work around PCIe link training failures

2023-05-07 Thread Maciej W. Rozycki
On Thu, 4 May 2023, Bjorn Helgaas wrote: > On Thu, Apr 06, 2023 at 01:21:31AM +0100, Maciej W. Rozycki wrote: > > Attempt to handle cases such as with a downstream port of the ASMedia > > ASM2824 PCIe switch where link training never completes and the link > > continues switching between speeds

Re: [PATCH v8 7/7] PCI: Work around PCIe link training failures

2023-05-04 Thread Bjorn Helgaas
On Thu, Apr 06, 2023 at 01:21:31AM +0100, Maciej W. Rozycki wrote: > Attempt to handle cases such as with a downstream port of the ASMedia > ASM2824 PCIe switch where link training never completes and the link > continues switching between speeds indefinitely with the data link layer > never rea

[PATCH v8 7/7] PCI: Work around PCIe link training failures

2023-04-05 Thread Maciej W. Rozycki
Attempt to handle cases such as with a downstream port of the ASMedia ASM2824 PCIe switch where link training never completes and the link continues switching between speeds indefinitely with the data link layer never reaching the active state. It has been observed with a downstream port of the

[PATCH v7 7/7] PCI: Work around PCIe link training failures

2023-04-04 Thread Maciej W. Rozycki
Attempt to handle cases such as with a downstream port of the ASMedia ASM2824 PCIe switch where link training never completes and the link continues switching between speeds indefinitely with the data link layer never reaching the active state. It has been observed with a downstream port of the

[PATCH v6 7/7] PCI: Work around PCIe link training failures

2023-02-05 Thread Maciej W. Rozycki
Attempt to handle cases such as with a downstream port of the ASMedia ASM2824 PCIe switch where link training never completes and the link continues switching between speeds indefinitely with the data link layer never reaching the active state. It has been observed with a downstream port of the