This thread is getting on a little bit .. so feel free to ignore me if you've come to a resolution and aren't paying attention.

I've got a more recent root port but have recently managed to solve all my boot (and re-boot) problems with a graphics card in a 6th gen root port and am interested to see if it generalizes. Takes a bit of work to see this as a DMA error but you can easily eliminate almost all the boot (obviously not re-boot) problems if you pass this kernel param to the host:

pci=pcie_bus_peer2peer

You may need to add realloc=on or realloc=off but it's usually link negotiation code in VM bios / drivers breaking my boots and link width negotiation in power management that kills my links. The above is all about getting neutral link parameters that are likely to be what you get when you reset a link or stuff 0 in the CTL, and that are so neutral you can cross-device DMA (not sure what does that anyway). Doesn't hurt to switch off power management in the bios just in case .. as most is purely handled in the bios with my board.

You also should bind pci-stub to the root port (things always hide in the pcieport driver) and try switching off the D3 state that vfio puts cards in as well. Even though everything is supposed to support D3 .. I'm 100% certain quite a lot doesn't support it 100% especially both hot and cold sustained D3 - which I'm sympathetic to as I feel like that when I wake up too. I am old enough to have positive memories of teletext but I've yet to have a positive feeling (or visual experience) involving VGA so I always turn support for it off in vfio-pci.

options vfio-pci ids=1000:0072 disable_vga=1 disable_idle_d3=1



I get a feeling I should probably take my wild conjecture to the dev forums, but as your card has worked with vfio before there should be fewer variables and those are the sorts of things that change between distributions / kernel releases. If I was to really trust my instinct though, because I've never messed around with a non-network end-point with SR-IOV before and I don't know what it means I'd focus on that first. That would only apply if the root port it was on had Virtual channels like mine (yours is not that far off, but I haven't just spent a week looking at it's spec like I have mine).

root@Poople-PC:/mnt/ssddata/Win10x64UEFI# lspci -nnv -s 00:01.0
00:01.0 PCI bridge [0604]: Intel Corporation Skylake PCIe Controller (x16) [8086:1901] (rev 07) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 122
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
        I/O behind bridge: 0000e000-0000efff
        Memory behind bridge: f7100000-f71fffff
Prefetchable memory behind bridge: 00000000e0000000-00000000efffffff Capabilities: [88] Subsystem: ASUSTeK Computer Inc. Skylake PCIe Controller (x16) [1043:8694]
        Capabilities: [80] Power Management version 3
        Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
        Capabilities: [a0] Express Root Port (Slot+), MSI 00
        Capabilities: [100] Virtual Channel
        Capabilities: [140] Root Complex Link
        Capabilities: [d94] #19
        Kernel driver in use: pcieport
        Kernel modules: shpchp

The Virtual Channel/Root Complex Link does DMA from the BARS, SR-IOV does DMA to(?) the bars, and how does vfio get the rock paper scissors for who does DMA? especially if something sets up the SR-IOV (as that's what it looks like in your lspci) and does the device need to see the root ports first 100h of registers at VM init if the link breaks (i.e. without sending msi messages)? or is that the wrong way around? Might not be a problem if you can find a non-Virtual Channel root port (any x1 ports?).

I'm definitely going to stop the wild conjecture now but I'm really getting a lot of non-specific "DMA setup error," feelings there, and if nobody tells me to stop smoking crack on this I might try to think of a way to neuter SR-IOV (presumably the busmaster is switched off by vfio), and I'll trace what happens with my graphics card in more detail. I found an article about SAS SR-IOV creating multiple devices for parallel reads but it was on the 4th search result page and I didn't get the impression you were doing that ... so I'm guessing that even though I'm speculating on this as I go along there's not many people who would notice. Probably Alex .. but still that's not many.

Regards,

D

P.S. If your desperate for your port spec. it'll be on intel's site, but be warned - I usually have to click on links to things that sound like specs for about 1/2 an hour before I find the actual full document. And the reward is ... well .. a 300 page spec.

On 25/05/16 20:08, Damon Namod wrote:
I could successfully pass-through the controller with VFIO using Fedora 24 
beta. The process was straight-forward. Unfortunately I cannot tell if the 
issue is caused by Ubuntu, Linux or Qemu (or a combination of them).

Best,
Damon

On 13 May 2016, at 14:51, Damon Namod <m...@damon.at> wrote:

Any further thoughts on this? Could this be a device failure?

Best,
Damon

On 11 May 2016, at 00:18, Damon Namod <m...@damon.at> wrote:

The output of `lspci -vvvs  01:00.0` is attached. Interestingly, the command 
generates an error somewhere in the middle of the output:

   pcilib: sysfs_read_vpd: read failed: Input/output error

The corresponding `dmesg` output is:

   [ 2587.922711] vfio-pci 0000:01:00.0: invalid short VPD tag 00 at offset 1

I blacklisted the `mpt3sas` driver and assign the device directly to VFIO:

   $ cat /etc/modprobe.d/vfio.conf
   blacklist mpt3sas
   options vfio-pci ids=1000:0072

Best,
Damon

<ibm-serveraid-m1015_lspci-vvvs.txt>


On 10 May 2016, at 20:01, Alex Williamson <alex.l.william...@gmail.com> wrote:

On Mon, May 9, 2016 at 4:46 PM, Damon Namod <m...@damon.at> wrote:
Hi all,

I just tried Linux 4.6.0-rc7 with qemu 2.6.0-rc4. Same error and behaviour. 
What the heck is this?

Another question/request, are you blacklisting the driver for this device 
(mpt3sas) or using pci-stub or other means to prevent it from claiming the 
device on the host or are you dynamically unbinding from mpt3sas?  If the 
latter, could you try to one of the former mechanism to make the HBA untouched 
by the host prior to assigning?

_______________________________________________
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


_______________________________________________
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users

Reply via email to