Hi, On Nov 17, 2017, at 09:33, Maik Broemme <mbroe...@libmpq.org> wrote: > Hi, > > I have a SuperMicro A2SDi-16C-HLN4F which uses recently released > Denverton SoC (Intel Atom C3955). This mainboard has one PCI-E 3.0 x4 > slot, but whatever card is included there it doesn't work with VFIO. > > 1. All tried cards work fine in another mainboard using VT-d and in > another mainboard using AMD-IOMMU. > > 2. All tried cards report DPC events (AER fixed them). However using > them on host seems to work fine (tried it for some time now) > > [29136.808030] dpc 0000:00:09.0:pcie010: DPC containment event, status:0x1f00 > source:0x0000 > [29136.808045] pcieport 0000:00:09.0: AER: Corrected error received: id=0048 > [29136.808051] pcieport 0000:00:09.0: PCIe Bus Error: severity=Corrected, > type=Data Link Layer, id=0048(Transmitter ID) > [29136.809533] pcieport 0000:00:09.0: device [8086:19a4] error > status/mask=00001000/00002000 > [29136.811079] pcieport 0000:00:09.0: [12] Replay Timer Timeout > > 00:09.0 is the PCI bridge and current device behind it is a Digital > Devices GmbH Octopus DVB Adapter. The above error is what I see on host > if using device there, as soon as I start using it vie VFIO I get the > following: > > Nov 17 05:06:13 server.theraso.int kernel: vfio-pci 0000:01:00.0: enabling > device (0140 -> 0142) > Nov 17 05:06:14 server.theraso.int kernel: vfio_bar_restore: 0000:01:00.0 > reset recovery - restoring bars > Nov 17 05:06:36 server.theraso.int kernel: vfio_bar_restore: 0000:01:00.0 > reset recovery - restoring bars > Nov 17 05:06:36 server.theraso.int kernel: vfio_bar_restore: 0000:01:00.0 > reset recovery - restoring bars > > Inside VM I get immediate after boot: > > Nov 17 00:25:18 vdr.theraso.int kernel: Disabling IRQ #11 > Nov 17 00:25:18 vdr.theraso.int kernel: [<ffffffffc074d060>] qxl_irq_handler > [qxl] > Nov 17 00:25:18 vdr.theraso.int kernel: [<ffffffffc03a4570>] usb_hcd_irq > [usbcore] > Nov 17 00:25:18 vdr.theraso.int kernel: handlers: > Nov 17 00:25:18 vdr.theraso.int kernel: secondary_startup_64+0x9f/0x9f > Nov 17 00:25:18 vdr.theraso.int kernel: x86_64_start_kernel+0x13e/0x161 > Nov 17 00:25:18 vdr.theraso.int kernel: x86_64_start_reservations+0x24/0x26 > Nov 17 00:25:18 vdr.theraso.int kernel: ? early_idt_handler_array+0x120/0x120 > Nov 17 00:25:18 vdr.theraso.int kernel: start_kernel+0x496/0x4b7 > Nov 17 00:25:18 vdr.theraso.int kernel: rest_init+0xd5/0xe0 > Nov 17 00:25:18 vdr.theraso.int kernel: cpu_startup_entry+0x73/0x80 > Nov 17 00:25:18 vdr.theraso.int kernel: do_idle+0x175/0x1e0 > Nov 17 00:25:18 vdr.theraso.int kernel: default_idle_call+0x23/0x30 > Nov 17 00:25:18 vdr.theraso.int kernel: arch_cpu_idle+0xf/0x20 > Nov 17 00:25:18 vdr.theraso.int kernel: default_idle+0x20/0x130 > ... > Nov 17 00:25:18 vdr.theraso.int kernel: Hardware name: QEMU Standard PC > (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014 > Nov 17 00:25:18 vdr.theraso.int kernel: CPU: 0 PID: 0 Comm: swapper/0 > Tainted: G C 4.13.5-1-ARCH #1 > Nov 17 00:25:18 vdr.theraso.int kernel: irq 11: nobody cared (try booting > with the "irqpoll" option) > > If I shutdown the VM, host puts device in a state which makes it not > working anymore: > > Nov 17 05:14:00 server.theraso.int kernel: vfio-pci 0000:01:00.0: Failed to > return from FLR > Nov 17 05:13:58 server.theraso.int kernel: vfio-pci 0000:01:00.0: timed out > waiting for pending transaction; performing function level reset anyway > > Next VM start: > > Nov 17 00:28:22 server.theraso.int kernel: vfio-pci 0000:01:00.0: Refused to > change power state, currently in D3 > > Moreover I've tried this all already with a RealTek RTL-8169 NIC. The > issue remains the same. As mentioned in the beginning the devices works > fine on other boards. > > Any help would be much appreciated to narrow down the problem. The DPC > events occurs also in case of not using VFIO at all. >
I've debugged this problem with a "Digital Devices Octopus DVB Adapter" and tried latest git kernel with PCI changes for v4.15. 1. The device state after host boot. root@server:~# lspci -vvv -s 01:00.0 01:00.0 Multimedia controller: Digital Devices GmbH Octopus DVB Adapter Subsystem: Digital Devices GmbH Cine S2 V6.5 DVB adapter Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Interrupt: pin A routed to IRQ 255 Region 0: Memory at dfc00000 (64-bit, non-prefetchable) [disabled] [size=64K] Capabilities: [50] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D3 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [70] MSI: Enable- Count=1/2 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [90] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported- RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s unlimited ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range A, TimeoutDis+, LTR-, OBFF Not Supported AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled AtomicOpsCtl: ReqEn- LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [100 v1] Vendor Specific Information: ID=0000 Rev=0 Len=00c <?> Kernel driver in use: vfio-pci Kernel modules: ddbridge 2. Finding the PCI bridge: root@server:~# lspci -vt -[0000:00]-+-00.0 Intel Corporation Device 1980 +-04.0 Intel Corporation Device 19a1 +-05.0 Intel Corporation Device 19a2 +-09.0-[01]----00.0 Digital Devices GmbH Octopus DVB Adapter +-10.0-[02]----00.0 Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961 +-11.0-[03-04]----00.0-[04]----00.0 ASPEED Technology, Inc. ASPEED Graphics Family +-12.0 Intel Corporation DNV SMBus Contoller - Host +-13.0 Intel Corporation DNV SATA Controller 0 +-14.0 Intel Corporation DNV SATA Controller 1 +-15.0 Intel Corporation Device 19d0 +-16.0-[05-06]--+-00.0 Intel Corporation Ethernet Connection X553 1GbE | +-00.1 Intel Corporation Ethernet Connection X553 1GbE | +-10.0 Intel Corporation X553 Virtual Function | +-10.1 Intel Corporation X553 Virtual Function | +-10.2 Intel Corporation X553 Virtual Function | +-10.3 Intel Corporation X553 Virtual Function | +-10.4 Intel Corporation X553 Virtual Function | +-10.5 Intel Corporation X553 Virtual Function | +-10.6 Intel Corporation X553 Virtual Function | +-10.7 Intel Corporation X553 Virtual Function | +-11.0 Intel Corporation X553 Virtual Function | +-11.1 Intel Corporation X553 Virtual Function | +-11.2 Intel Corporation X553 Virtual Function | +-11.3 Intel Corporation X553 Virtual Function | +-11.4 Intel Corporation X553 Virtual Function | +-11.5 Intel Corporation X553 Virtual Function | +-11.6 Intel Corporation X553 Virtual Function | \-11.7 Intel Corporation X553 Virtual Function +-17.0-[07-08]--+-00.0 Intel Corporation Ethernet Connection X553 1GbE | +-00.1 Intel Corporation Ethernet Connection X553 1GbE | +-10.0 Intel Corporation X553 Virtual Function | +-10.1 Intel Corporation X553 Virtual Function | +-10.2 Intel Corporation X553 Virtual Function | +-10.4 Intel Corporation X553 Virtual Function | +-10.6 Intel Corporation X553 Virtual Function | +-11.0 Intel Corporation X553 Virtual Function | +-11.2 Intel Corporation X553 Virtual Function | +-11.4 Intel Corporation X553 Virtual Function | \-11.6 Intel Corporation X553 Virtual Function +-18.0 Intel Corporation Device 19d3 +-1f.0 Intel Corporation DNV LPC or eSPI +-1f.2 Intel Corporation Device 19de +-1f.4 Intel Corporation DNV SMBus controller \-1f.5 Intel Corporation DNV SPI Controller 3. Secondary bus reset: root@server:~# setpci -s 0000:00:09.0 BRIDGE_CONTROL=40:40 4. Clearing: root@server:~# setpci -s 0000:00:09.0 BRIDGE_CONTROL=00:40 5. Checking if device is still functional: root@server:~# lspci -vvv -s 01:00.0 01:00.0 Multimedia controller: Digital Devices GmbH Octopus DVB Adapter (rev ff) (prog-if ff) !!! Unknown header type 7f Kernel driver in use: vfio-pci Kernel modules: ddbridge It looks like device has been disappeared from the PCI bridge / bus. This is very strange and should probably not happen. This is exactly the same lspci output as I get after starting the VM with device passthrough. The same issue can be reproduced with RealTek RTL8111D NIC. However both cards doing passthrough fine on an Opteron Mainboard and an ASrock mainboard with Skylake Core i5. Meanwhile I've opened a case for it at SuperMicro but it is not clear if it is an EFI/BIOS issue. Also this problem looks similar to this one many people have with VFIO passthrough on Ryzen/Threadripper but my system is Intel Atom C3xxx based. Anybody any ideas? > --Maik --Maik _______________________________________________ vfio-users mailing list vfio-users@redhat.com https://www.redhat.com/mailman/listinfo/vfio-users