On Wed, 19 Oct 2016 10:00:57 -0500 Kevin Vasko <kva...@gmail.com> wrote:
> Sure thing. I'm attaching all of the logs I have to let you get a bigger > picture (and anyone that might run into a similar issue). Hopefully I > didn't mess anything up. > ... Here's the bit I was curious about: > #showing parent bridge of a device that has a failed > #:lspci -vvvs 03:00 > 03:00.0 PCI bridge: PLX Technology, Inc. Device 8796 (rev ab) (prog-if 00 > [Normal decode]) ... > LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM not supported, Exit Latency > L0s <4us, L1 <8us > ClockPM- Surprise- LLActRep- BwNot- > LnkCtl: ASPM Disabled; Disabled- CommClk- > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk- DLActive- BWMgmt- > ABWMgmt- The Link Status shows that it's in Gen1 mode at x0 width, so the link failed to return to a working state after bus reset. Maybe a hint is that the Slot Status register shows that the Presence Detect Changed bit got flipped, but the Presence Detect State bit remains 1, indicating that a card is present. However Presence Detect Changed Enable is not set in the Slot Control register, so the OS doesn't get notified about this. I wonder what would happen if we cleared the Presence Detect Changed bit and tried to retrain the link. The express capability is at 0x68, the slot status register is at 0x1a, bit 3 is the presence detect changed bit and it's RW1C (read, write 1 to clear). Therefore to clear the bit we could do: setpci -s 3:00.0 82.w=8:8 Recheck with lspci -vvvs 3:00.0 to check whether SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock- Changed: MRL- PresDet+ LinkState- ^^^^^^^^ Still reports + or - and possible if the link has decided to retrain. To force a retrain we need to poke bit 5 in the link control register, offset 0x10: setpci -s 3:00.0 78.w=20:20 Recheck lspci to see if there's any progress. ... > #showing parent device that has a NON failed device > #: lspci -vvvs 03:08 > 03:08.0 PCI bridge: PLX Technology, Inc. Device 8796 (rev ab) (prog-if 00 > [Normal decode]) ... > LnkCap: Port #8, Speed 8GT/s, Width x16, ASPM not supported, Exit Latency > L0s <4us, L1 <8us > ClockPM- Surprise- LLActRep- BwNot- > LnkCtl: ASPM Disabled; Disabled- CommClk- > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk- DLActive- BWMgmt- > ABWMgmt- In this case the link has retrained to Gen3 x16 and of course the downstream devices are accessible. The Presence Detect Changed bit is set to - on this port. Thanks, Alex _______________________________________________ vfio-users mailing list vfio-users@redhat.com https://www.redhat.com/mailman/listinfo/vfio-users