Alex, Thanks, but no luck.
I ran : #:setpci -s 3:00.0 82.w=8:8 checked #:lspci -vvvs 3:00.0 MRL- was the same. #: setpci -s 3:00.0 78.w=20:20 checked: #: lspci -vvs 3:00.0 MRL- was the same LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt- SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock- Changed: MRL- PresDet+ LinkState- Just for my own knowledge what does "retrain" mean? I assume resetting the bus and it reconnecting successfully? Thanks again, -Kevin On Wed, Oct 19, 2016 at 10:50 AM, Alex Williamson < alex.william...@redhat.com> wrote: > On Wed, 19 Oct 2016 10:00:57 -0500 > Kevin Vasko <kva...@gmail.com> wrote: > > > Sure thing. I'm attaching all of the logs I have to let you get a bigger > > picture (and anyone that might run into a similar issue). Hopefully I > > didn't mess anything up. > > > ... > > Here's the bit I was curious about: > > > #showing parent bridge of a device that has a failed > > #:lspci -vvvs 03:00 > > 03:00.0 PCI bridge: PLX Technology, Inc. Device 8796 (rev ab) (prog-if 00 > > [Normal decode]) > ... > > LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM not supported, Exit Latency > > L0s <4us, L1 <8us > > ClockPM- Surprise- LLActRep- BwNot- > > LnkCtl: ASPM Disabled; Disabled- CommClk- > > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > > LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk- DLActive- BWMgmt- > > ABWMgmt- > > > The Link Status shows that it's in Gen1 mode at x0 width, so the link > failed to return to a working state after bus reset. Maybe a hint is > that the Slot Status register shows that the Presence Detect Changed bit > got flipped, but the Presence Detect State bit remains 1, indicating > that a card is present. However Presence Detect Changed Enable is not > set in the Slot Control register, so the OS doesn't get notified about > this. > > I wonder what would happen if we cleared the Presence Detect Changed > bit and tried to retrain the link. The express capability is at 0x68, > the slot status register is at 0x1a, bit 3 is the presence detect > changed bit and it's RW1C (read, write 1 to clear). Therefore to clear > the bit we could do: > > setpci -s 3:00.0 82.w=8:8 > > Recheck with lspci -vvvs 3:00.0 to check whether > > SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock- > Changed: MRL- PresDet+ LinkState- > ^^^^^^^^ > > Still reports + or - and possible if the link has decided to retrain. > To force a retrain we need to poke bit 5 in the link control register, > offset 0x10: > > setpci -s 3:00.0 78.w=20:20 > > Recheck lspci to see if there's any progress. > > ... > > #showing parent device that has a NON failed device > > #: lspci -vvvs 03:08 > > 03:08.0 PCI bridge: PLX Technology, Inc. Device 8796 (rev ab) (prog-if 00 > > [Normal decode]) > ... > > LnkCap: Port #8, Speed 8GT/s, Width x16, ASPM not supported, Exit Latency > > L0s <4us, L1 <8us > > ClockPM- Surprise- LLActRep- BwNot- > > LnkCtl: ASPM Disabled; Disabled- CommClk- > > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > > LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk- DLActive- BWMgmt- > > ABWMgmt- > > In this case the link has retrained to Gen3 x16 and of course the > downstream devices are accessible. The Presence Detect Changed bit is > set to - on this port. Thanks, > > Alex >
_______________________________________________ vfio-users mailing list vfio-users@redhat.com https://www.redhat.com/mailman/listinfo/vfio-users