On Thu, Dec 08, 2022 at 12:39:57PM -0800, Guenter Roeck wrote: > On Thu, Dec 08, 2022 at 12:13:55PM -0800, Guenter Roeck wrote: > > On Thu, Dec 08, 2022 at 10:47:42AM -0800, Guenter Roeck wrote: > > > > > > > > A cq head doorbell mmio is skipped... And it is not the fault of the > > > > kernel. The kernel is in it's good right to skip the mmio since the cq > > > > eventidx is not properly updated. > > > > > > > > Adding that and it boots properly on riscv. But I'm perplexed as to why > > > > this didnt show up on our regularly tested platforms. > > > > > > > > Gonna try to get this in for 7.2! > > > > > > I see another problem with sparc64. > > > > > > [ 5.261508] could not locate request for tag 0x0 > > > [ 5.261711] nvme nvme0: invalid id 0 completed on queue 1 > > > > > > That is seen repeatedly until the request times out. I'll test with > > > your patch to see if it resolves this problem as well, and will bisect > > > otherwise. > > > > > The second problem is unrelated to the doorbell problem. > > It is first seen in qemu v7.1. I'll try to bisect. > > > > Unfortunately, the problem observed with sparc64 also bisects to this > patch. Making things worse, "hw/nvme: fix missing cq eventidx update" > does not fix it (which is why I initially thought it was unrelated). > > I used the following qemu command line. > > qemu-system-sparc64 -M sun4v -cpu "TI UltraSparc IIi" -m 512 -snapshot \ > -device nvme,serial=foo,drive=d0,bus=pciB \ > -drive file=rootfs.ext2,if=none,format=raw,id=d0 \ > -kernel arch/sparc/boot/image -no-reboot \ > -append "root=/dev/nvme0n1 console=ttyS0" \ > -nographic -monitor none >
With completed tests, it turns out the problem is seen with various emulations running big endian CPUs. Example from arm64be: [ 4.736752] nvme nvme0: pci function 0000:00:02.0 [ 4.737829] nvme 0000:00:02.0: enabling device (0000 -> 0002) [ 4.774673] nvme nvme0: 2/0/0 default/read/poll queues [ 4.779331] nvme nvme0: Ignoring bogus Namespace Identifiers [ 4.799400] could not locate request for tag 0x0 [ 4.799533] nvme nvme0: invalid id 0 completed on queue 2 [ 4.799612] could not locate request for tag 0x0 [ 4.799676] nvme nvme0: invalid id 0 completed on queue 2 [ 4.799744] could not locate request for tag 0x0 powerpc: could not locate request for tag 0x0 nvme nvme0: invalid id 0 completed on queue 1 could not locate request for tag 0x0 nvme nvme0: invalid id 0 completed on queue 1 trace logs (arm64be, good, qemu v7.0): pci_nvme_admin_cmd cid 2864 sqid 0 opc 0x6 opname 'NVME_ADM_CMD_IDENTIFY' pci_nvme_identify cid 2864 cns 0x5 ctrlid 0 csi 0x0 pci_nvme_identify_ns_csi nsid=1, csi=0x0 pci_nvme_map_prp trans_len 4096 len 4096 prp1 0x44a84000 prp2 0x0 num_prps 2 pci_nvme_map_addr addr 0x44a84000 len 4096 pci_nvme_enqueue_req_completion cid 2864 cqid 0 dw0 0x0 dw1 0x0 status 0x0 pci_nvme_irq_msix raising MSI-X IRQ vector 0 pci_nvme_mmio_write addr 0x1004 data 0x10 size 4 pci_nvme_mmio_doorbell_cq cqid 0 new_head 16 pci_nvme_mmio_write addr 0x1008 data 0x1 size 4 pci_nvme_mmio_doorbell_sq sqid 1 new_tail 1 pci_nvme_io_cmd cid 32770 nsid 0x1 sqid 1 opc 0x2 opname 'NVME_NVM_CMD_READ' pci_nvme_read cid 32770 nsid 1 nlb 8 count 4096 lba 0x0 pci_nvme_map_prp trans_len 4096 len 4096 prp1 0x44879000 prp2 0x0 num_prps 2 pci_nvme_map_addr addr 0x44879000 len 4096 pci_nvme_rw_cb cid 32770 blk 'd0' pci_nvme_rw_complete_cb cid 32770 blk 'd0' pci_nvme_enqueue_req_completion cid 32770 cqid 1 dw0 0x0 dw1 0x0 status 0x0 pci_nvme_irq_msix raising MSI-X IRQ vector 1 pci_nvme_mmio_write addr 0x100c data 0x1 size 4 pci_nvme_mmio_doorbell_cq cqid 1 new_head 1 pci_nvme_mmio_write addr 0x1008 data 0x2 size 4 pci_nvme_mmio_doorbell_sq sqid 1 new_tail 2 trace log (arm64be, bad, qemu v7.2): pci_nvme_admin_cmd cid 5184 sqid 0 opc 0x6 opname 'NVME_ADM_CMD_IDENTIFY' pci_nvme_identify cid 5184 cns 0x5 ctrlid 0 csi 0x0 pci_nvme_identify_ns_csi nsid=1, csi=0x0 pci_nvme_map_prp trans_len 4096 len 4096 prp1 0x44e56000 prp2 0x0 num_prps 2 pci_nvme_map_addr addr 0x44e56000 len 4096 pci_nvme_enqueue_req_completion cid 5184 cqid 0 dw0 0x0 dw1 0x0 status 0x0 pci_nvme_update_sq_eventidx sqid 0 new_eventidx 18 pci_nvme_update_sq_tail sqid 0 new_tail 18 pci_nvme_update_cq_eventidx cqid 0 new_eventidx 16 pci_nvme_update_cq_head cqid 0 new_head 16 pci_nvme_irq_msix raising MSI-X IRQ vector 0 pci_nvme_mmio_write addr 0x1004 data 0x11 size 4 pci_nvme_mmio_doorbell_cq cqid 0 new_head 17 pci_nvme_mmio_write addr 0x1010 data 0x1 size 4 pci_nvme_mmio_doorbell_sq sqid 2 new_tail 1 pci_nvme_update_sq_tail sqid 2 new_tail 0 pci_nvme_io_cmd cid 16384 nsid 0x1 sqid 2 opc 0x2 opname 'NVME_NVM_CMD_READ' pci_nvme_read cid 16384 nsid 1 nlb 8 count 4096 lba 0x0 pci_nvme_map_prp trans_len 4096 len 4096 prp1 0x44cea000 prp2 0x0 num_prps 2 pci_nvme_map_addr addr 0x44cea000 len 4096 pci_nvme_update_sq_eventidx sqid 2 new_eventidx 0 pci_nvme_update_sq_tail sqid 2 new_tail 0 pci_nvme_io_cmd cid 0 nsid 0x0 sqid 2 opc 0x0 opname 'NVME_NVM_CMD_FLUSH' pci_nvme_enqueue_req_completion cid 0 cqid 2 dw0 0x0 dw1 0x0 status 0x400b pci_nvme_err_req_status cid 0 nsid 0 status 0x400b opc 0x0 pci_nvme_update_sq_eventidx sqid 2 new_eventidx 0 pci_nvme_update_sq_tail sqid 2 new_tail 0 [flush command repeated many times] Guenter