Thanks for input on this - I am surprised to hear that Quadros and K4000 exhibit the same issues on bare metal! This means I need to revise my expectations. I will look at your suggestion involving virtual cliques and see where it takes me..
On Sat, 2017-09-23 at 11:26 -0600, Alex Williamson wrote: > On Sat, 23 Sep 2017 17:00:37 +0100 > Ilias Kasoumis <ilias.kasou...@gmail.com> wrote: > > > Hi, > > I would like to draw upon the list participants' know-how and > > experience in trying to resolve the following issue. I have tried > > in > > vain to get NVidia's support in the past, I have given up for quite > > a > > long time in the hope it will get fixed as a matter of course but > > coming back to it half a year later (and multiple kernel and driver > > versions later) I see it still persists. (The original post was > > https:/ > > /devtalk.nvidia.com/default/topic/996091/peer-to-peer-dma-issue-/ > > here > > and I am copying below) > > > > > > The bug that makes the use of multiple GTX1080's impossible when I > > turn > > on the IOMMU in Linux (tried kernels 4.8 and 4.13, using either > > standard iommu=on or iommu=on,igfx_off or iommu=pt for passthrough > > mode) on a X99 board. > > > > The bug can be triggered by running any peer-to-peer memory > > transfer, > > for example running the CUDA 8.0 Samples code > > 1_Utilities/p2pBandwidthLatencyTest from the terminal triggers the > > problem: the video driver (and as a result the X server) crashes > > immediately, and after multiple Ctrl-C's and waiting for tens of > > seconds the server eventually restarts and I am presented with a > > login > > prompt to X Windows. > > > > The relevant kernel error messages are (thousands of these lines, > > just > > a snippet below:) > > > > [ 51.691440] DMAR: DRHD: handling fault status reg 2 > > [ 51.691450] DMAR: [DMA Write] Request device [04:00.0] fault > > addr > > f8139000 [fault reason 05] PTE Write access is not set > > [ 51.691457] DMAR: [DMA Write] Request device [04:00.0] fault > > addr > > f8139000 [fault reason 05] PTE Write access is not set > > [ 51.691462] DMAR: [DMA Write] Request device [04:00.0] fault > > addr > > f8139000 [fault reason 05] PTE Write access is not set > > [ 51.691465] DMAR: [DMA Write] Request device [04:00.0] fault > > addr > > f8139000 [fault reason 05] PTE Write access is not set > > [ 51.691470] DMAR: DRHD: handling fault status reg 400 > > [ 51.740674] DMAR: DRHD: handling fault status reg 402 > > [ 51.740683] DMAR: [DMA Write] Request device [04:00.0] fault > > addr > > f8139000 [fault reason 05] PTE Write access is not set > > [ 51.740688] DMAR: [DMA Write] Request device [04:00.0] fault > > addr > > f8139000 [fault reason 05] PTE Write access is not set > > [ 51.740693] DMAR: [DMA Write] Request device [04:00.0] fault > > addr > > f8139000 [fault reason 05] PTE Write access is not set > > > > Cleary the above suggest that the CUDA driver is attempting DMA at > > an > > address for which the corresponding iommu page table entry write > > flag > > is not set, presumably because the driver has not properly > > registered/requested access via the general dma_map() kernel > > interface > > (https://www.kernel.org/doc/Documentation/DMA-API-HOWTO.txt) > > > > Scouting the net reveals a bug registered (https://bugzilla.kernel. > > org/ > > show_bug.cgi?id=188271) for exactly the same reason on totally > > different hardware (Supermicro Dual socket board) using Pascal > > Titan- > > X's, so same architecture cards as mine. Interestingly enough, the > > kernel error messages in this report claim unauthorized access of > > *exactly* the same memory address! (f8139000, in bold below) : > > > > [16193.666976] DMAR: [DMA Write] Request device [82:00.0] fault > > addr f8 > > 139000 [fault reason 05] PTE Write access is not set (edited) > > > > So this looks like a red flag that somehow the indirection afforded > > by > > the iommu is bypassed and the driver is using hardcoded DMA > > addresses. > > Please note that the author of the bug report claims that seting > > iommu=igfx_off somehow solves this, but really igfx_off per se > > should > > be irrelevant here without turning the iommu support on first, with > > something like iommu=on,igfx_off. What instead happens is that most > > likely iommu=igfx_off as opposed to iommu=on just turns off iommu > > altogether, allowing the dma to succeed. This is exactly what > > happens > > on my system too. So in other words the bug report merely states > > that > > turning off the iommu allows peer-to-peer tranfers to work. Still > > his > > detailed log files should be very useful for an independent > > manifestation of the same issue. My log files are attached on the > > original thread included at the start of this post. > > > > I am using an ASRock X99 board (x99e-itx/ac) with latest firmware, > > intel i6800k, dual Asus GTX-1080s Founder's Edition, 32GB ram and > > Ubuntu 16.10 (or 17.10 now) with all updates applied (kernel 4.8.0- > > 37 > > or 4.13 now) with driver 378.13 or 384.69. > > > > Have you come across this while trying to virtualize nvidia GPUs? > > Given > > the Linux driver forum at nvidia refuses to display bug posts by > > users > > (they remain "hidden") and given nvidia would much have you buy > > quadro's and tesla's instead the conspiracy theorist in me is more > > inclined to believe that vt-d is intentionally disabled in consumer > > versions of the hardware... > > > > Thanks for any input/solutions! > > IME, supported Quadro fail in the same way on a bare metal host with > iommu enabled running the p2pBandwidthLatencyTest from the cuda > tests. > K4000 did the same thing for me. Also note that the igfx_off option > is > specifically an intel_iommu parameter and IIRC, only changes the > behavior of integrated ('i' in igfx) graphics. As you're on X99, > this > option is irrelevant. I haven't investigated why iommu=pt doesn't > work > here, X99 should have hardware passthrough support in the DRHD, but > maybe it doesn't work for p2p. > > Since you're asking vfio-users about this bare metal iommu issue, let > me also note this QEMU patch series: > > https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg05826.html > > I have no idea if NVIDIA enables GPUDirect on GeForce cards, but you > might actually be able to do what you're looking for within a VM > since > vfio will map all memory and mmio through the iommu. These mappings > are transparent for the guest kernel and userspace, so it just works. > > Perhaps NVIDIA hasn't added DMA-API support to their driver for these > use cases simply because of the iommu overhead. If devices are > operating in a virtual address space (iova), all transactions need to > pass through the iommu for translation. In order to get p2p > directly through switches downsteam in the topology, the switch needs > to > support ACS Direct Translation and the endpoints need to supports > Address Translation Services ATS. NVIDIA devices do not support the > latter and ACS DT is a mostly unexplored space. Since you're > using 1080s which only have a single GPU per card, switches are > maybe not involved unless they're built into your > motherboard. Thanks, > > Alex _______________________________________________ vfio-users mailing list vfio-users@redhat.com https://www.redhat.com/mailman/listinfo/vfio-users