https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=283189
--- Comment #1 from Jason A. Harmening <j...@freebsd.org> --- Reverting from nda(4) to nvd(4) didn't resolve this issue, if anything it appeared to make it slightly worse. I did enable NVMe verbose command logging, which yielded error logs like the following: nvme0: WRITE sqid:14 cid:120 nsid:1 lba:475732976 len:96 DMAR4: nvme0: pci7:0:0 sid 700 fault acc 1 adt 0x0 reason 0x6 addr a5243000 nvme0: nsid:0x1 rsvd2:0 rsvd3:0 mptr:0 prp1:0xa5243000 prp2:0xcf800 nvme0: cdw10: 0x1c5b1bf0 cdw11:0 cdw12:0x5f cdw13:0 cdw14:0 cdw15:0 nvme0: DATA TRANSFER ERROR (00/04) crd:0 m:1 dnr:1 p:0 sqid:14 cid:120 cdw0:0 The errors continue to always be for NVMe writes (i.e. a DMA read access by the controller). I've also still never seen these faults for any device besides nvme, and all still show the same DMAR fault code and similar small transfer sizes. Interestingly, all of the errors I've seen so far (about 15 of them since enabling verbose logging) show the DMAR fault being taken against the buffer in PRP1, even in cases in which PRP2 is populated. So it seems the NVMe access that triggers the fault is always at the beginning of the region mapped by the NVMe command. This "smells" like the sort of issue I'm used to seeing at $work on weakly-ordered arm64 devices when there is a missing barrier between a page table modification and a memory access that has an implicit dependency on the page table modification. In this case the page table modification would be the DMAR PTE write that maps the PRP1 buffer, while the memory access would be the NVMe controller read triggered by appending the write command to the submission queue. But I would be surprised if that kind of issue is at play here given the stronger ordering of the x86 memory model. -- You are receiving this mail because: You are the assignee for the bug.