On 7/6/20 12:15 AM, Klaus Jensen wrote: > On Jul 2 16:33, Andrzej Jakowski wrote: >> On 7/2/20 10:51 AM, Klaus Jensen wrote: >>> On Jul 2 08:07, Andrzej Jakowski wrote: >>>> On 7/2/20 3:31 AM, Klaus Jensen wrote: >>>>> Aight, an update here. This only happens when QEMU is run with a virtual >>>>> IOMMU. Otherwise, the kernel is happy. >>>>> >>>>> With the vIOMMU, qemu also craps out a bit: >>>>> >>>>> qemu-system-x86_64: vtd_iova_to_slpte: detected slpte permission error >>>>> (iova=0xfd200000, level=0x2, slpte=0x0, write=0) >>>>> qemu-system-x86_64: vtd_iommu_translate: detected translation failure >>>>> (dev=03:00:00, iova=0xfd200000) >>>>> >>>>> So I think we are back in QEMU land for the bug. >>>> >>>> Can you share command line for that? >>>> >>>> >>> >>> qemu-system-x86_64 \ >>> -nodefaults \ >>> -display none \ >>> -device intel-iommu,pt,intremap=on,device-iotlb=on \ >>> -machine type=q35,accel=kvm,kernel_irqchip=split \ >>> -cpu host \ >>> -smp 4 \ >>> -m 8G \ >>> -nic user,model=virtio-net-pci,hostfwd=tcp::2222-:22 \ >>> -device virtio-rng-pci \ >>> -drive >>> id=boot,file=/home/kbj/work/src/vmctl/state/pmr/boot.qcow2,format=qcow2,if=virtio,discard=on,detect-zeroes=unmap >>> \ >>> -device pcie-root-port,id=pcie_root_port1,chassis=1,slot=0 \ >>> -device x3130-upstream,id=pcie_upstream1,bus=pcie_root_port1 \ >>> -device >>> xio3130-downstream,id=pcie_downstream1,bus=pcie_upstream1,chassis=1,slot=1 \ >>> -drive >>> id=nvme0n1,file=/home/kbj/work/src/vmctl/state/pmr/nvme0n1.img,format=raw,if=none,discard=on,detect-zeroes=unmap >>> \ >>> -object memory-backend-file,id=pmr,share=on,mem-path=pmr.bin,size=1M \ >>> -device >>> nvme,id=nvme0,serial=deadbeef,bus=pcie_downstream1,drive=nvme0n1,msix_qsize=1,pmrdev=pmr,cmb_size_mb=2 >>> \ >>> -pidfile /home/kbj/work/src/vmctl/run/pmr/pidfile \ >>> -kernel /home/kbj/work/src/kernel/linux/arch/x86_64/boot/bzImage \ >>> -append root=/dev/vda1 console=ttyS0,115200 audit=0 nokaslr \ >>> -virtfs >>> local,path=/home/kbj/work/src/kernel/linux,security_model=none,readonly,mount_tag=modules >>> \ >>> -serial mon:stdio \ >>> -trace pci_nvme* >>> >>> >> >> I focused on reproduction and it looks to me that my patch doesn't >> necessarily introduce regression. I run it w/ and w/o patch in both cases >> getting error while registering. Here is kernel guest log: >> >> [ 87.606482] nvme nvme0: pci function 0000:00:04.0 >> [ 87.635577] dev=0000000095b0a83b bar=2 size=134217728 offset=0 >> [ 87.636593] nvme nvme0: failed to register the CMB ret=-95 >> [ 87.643262] nvme nvme0: 12/0/0 default/read/poll queues >> >> Any thoughts? >> > > Hmm, that's not what I am seeing. > > With kernel v5.8-rc4, I'm not seeing any issues with CMB with and > without IOMMU on QEMU master. With your patches, my kernel (v5.8-rc4) > pukes both with and without iommu. > > BUT! This doesn't mean that your patch is bad, it looks more like an > issue in the kernel. I still think the BAR configuration looks sane, but > I am not expert on this. > > To satisify my curiosity I tried mending your patch to put the CMB on > offset 0 and move the MSI-X vector table and PBA to BAR 0 (like I > suggested back in the day). That works. With and without IOMMU. So, I > think it is an issue with the Linux kernel not being too happy about the > CMB being at an offset. It doesn't directly look like an issue in the > nvme driver since the issue shows up far lower in the memory subsystem, > but it would be nice to have the linux nvme gang at least acknowledge > the issue. >
I have managed to reproduce that problem and played with patch to see when the problem occurs vs not. When I put MSIX back to BAR2 (no PMR at all) and CMB left at BAR4 but starting at offset 0 I was still able to reproduce issue. So then I've played with memory region API and interesting observed that problem occurs when region overlaying is used via: memory_region_init(&n->bar4, OBJECT(n), "nvme-bar4", bar_size);$ $ if (n->params.cmb_size_mb) {$ memory_region_init_io(&n->ctrl_mem, OBJECT(n), &nvme_cmb_ops, n,$ "nvme-cmb", NVME_CMBSZ_GETSIZE(n->bar.cmbsz));$ $ memory_region_add_subregion_overlap(&n->bar4, cmb_offset, &n->ctrl_mem, 1);$ }$ on the other hand when cmb memory region is initialized w/o region overlaying that is: memory_region_init_io(&n->ctrl_mem, OBJECT(n), &nvme_cmb_ops, n,$ "nvme-cmb", NVME_CMBSZ_GETSIZE(n->bar.cmbsz)); I get no reproduction. Also observed qemu complaing about failed translation: qemu-system-x86_64: vtd_iova_to_slpte: detected slpte permission error (iova=0xfe400000, level=0x2, slpte=0x0, write=0) qemu-system-x86_64: vtd_iommu_translate: detected translation failure (dev=03:00:00, iova=0xfe400000) Not sure how we want to proceed. Any suggestions?