https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=272135
Bug ID: 272135 Summary: hot-swap NVMe drive not consistently detected Product: Base System Version: CURRENT Hardware: Any OS: Any Status: New Severity: Affects Only Me Priority: --- Component: kern Assignee: b...@freebsd.org Reporter: ema...@freebsd.org Test system has 24 2.5" slots. src tree is 194e059bb80334e6f4f791a186015b20d7f6f4b8 + some unrelated local changes. Test NVMe drive is WD Blue SN570 500GB 234110WD. The NVMe drive is not detected when installed at boot and not consistently detected when inserted after boot. It is sometimes detected if removed from and inserted into the same slot, and sometimes after moving to a different slot. Additional detail to be added after further investigation. Example insert/removal event timeline showing verbose kernel messages: ** boot with NVMe installed in slot 0 <no NVMe-related kernel messages> ** detach NVMe in slot 0 <no kernel messages> ** insert NVMe in slot 0 Jun 2 20:40:49 xxx kernel: pcib7: HotPlug interrupt: 0x48 Jun 2 20:40:49 xxx kernel: pcib7: Presence Detect Changed to card present Jun 2 20:40:49 xxx kernel: pci24: <ACPI PCI bus> numa-domain 0 on pcib7 Jun 2 20:40:49 xxx kernel: pcib7: allocated bus range (65-65) for rid 0 of pci24 Jun 2 20:40:49 xxx kernel: pci24: domain=0, physical bus=65 Jun 2 20:40:49 xxx kernel: pcib7: HotPlug interrupt: 0x140 Jun 2 20:40:49 xxx kernel: pcib7: Data Link Layer State Changed to active Jun 2 20:40:49 xxx kernel: pcib7: HotPlug interrupt: 0x140 Jun 2 20:40:49 xxx kernel: pcib7: Data Link Layer State Changed to active ** detach NVMe in slot 0 Jun 2 20:41:32 qrb16 kernel: pcib7: HotPlug interrupt: 0x8 Jun 2 20:41:32 qrb16 kernel: pcib7: Presence Detect Changed to empty Jun 2 20:41:32 qrb16 kernel: pci24: detached ** insert NVMe in slot 0 <no kernel messages> ** detach NVMe in slot 0 <no kernel messages> ** insert NVMe in slot 1 Jun 2 20:43:35 xxx kernel: pcib8: HotPlug interrupt: 0x48 Jun 2 20:43:35 xxx kernel: pcib8: Presence Detect Changed to card present Jun 2 20:43:35 xxx kernel: pcib8: Missed HotPlug interrupt waiting for DLL Active Jun 2 20:43:35 xxx kernel: pcib8: HotPlug interrupt: 0x140 Jun 2 20:43:35 xxx kernel: pcib8: Data Link Layer State Changed to active Jun 2 20:43:35 xxx kernel: pci24: <ACPI PCI bus> numa-domain 0 on pcib8 Jun 2 20:43:35 xxx kernel: pcib8: allocated bus range (66-66) for rid 0 of pci24 Jun 2 20:43:35 xxx kernel: pci24: domain=0, physical bus=66 Jun 2 20:43:35 xxx kernel: found-> vendor=0x15b7, dev=0x501a, revid=0x00 Jun 2 20:43:35 xxx kernel: domain=0, bus=66, slot=0, func=0 Jun 2 20:43:35 xxx kernel: class=01-08-02, hdrtype=0x00, mfdev=0 Jun 2 20:43:35 xxx kernel: cmdreg=0x0000, statreg=0x0010, cachelnsz=0 (dwords) Jun 2 20:43:35 xxx kernel: lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) Jun 2 20:43:35 xxx kernel: intpin=a, irq=255 Jun 2 20:43:35 xxx kernel: powerspec 3 supports D0 D3 current D0 Jun 2 20:43:35 xxx kernel: MSI supports 32 messages, 64 bit Jun 2 20:43:35 xxx kernel: MSI-X supports 17 messages in maps 0x10 and 0x20 Jun 2 20:43:35 xxx kernel: map[10]: type Memory, range 64, base 0, size 14, memory disabled Jun 2 20:43:35 xxx kernel: map[20]: type Memory, range 64, base 0, size 8, memory disabled Jun 2 20:43:35 xxx kernel: nvme1: <Generic NVMe Device> at device 0.0 numa-domain 0 on pci24 Jun 2 20:43:35 xxx kernel: pcib6: allocated type 3 (0xce000000-0xce0fffff) for rid 20 of pcib8 Jun 2 20:43:35 xxx kernel: pcib8: allocated initial memory window of 0xce000000-0xce0fffff Jun 2 20:43:35 xxx kernel: pcib8: allocated memory range (0xce000000-0xce003fff) for rid 10 of nvme1 Jun 2 20:43:35 xxx kernel: nvme1: Lazy allocation of 0x4000 bytes rid 0x10 type 3 at 0xce000000 Jun 2 20:43:35 xxx kernel: pcib8: allocated memory range (0xce004000-0xce0040ff) for rid 20 of nvme1 Jun 2 20:43:35 xxx kernel: nvme1: Lazy allocation of 0x100 bytes rid 0x20 type 3 at 0xce004000 Jun 2 20:43:35 xxx kernel: nvme1: attempting to allocate 17 MSI-X vectors (17 supported) Jun 2 20:43:35 xxx kernel: msi: routing MSI-X IRQ 340 to local APIC 52 vector 48 Jun 2 20:43:35 xxx kernel: msi: routing MSI-X IRQ 341 to local APIC 54 vector 48 Jun 2 20:43:35 xxx kernel: msi: routing MSI-X IRQ 342 to local APIC 56 vector 48 Jun 2 20:43:35 xxx kernel: msi: routing MSI-X IRQ 343 to local APIC 58 vector 48 Jun 2 20:43:35 xxx kernel: msi: routing MSI-X IRQ 344 to local APIC 60 vector 48 Jun 2 20:43:35 xxx kernel: msi: routing MSI-X IRQ 345 to local APIC 62 vector 48 Jun 2 20:43:35 xxx kernel: msi: routing MSI-X IRQ 346 to local APIC 64 vector 48 Jun 2 20:43:35 xxx kernel: msi: routing MSI-X IRQ 347 to local APIC 66 vector 48 Jun 2 20:43:35 xxx kernel: msi: routing MSI-X IRQ 348 to local APIC 68 vector 48 Jun 2 20:43:35 xxx kernel: msi: routing MSI-X IRQ 349 to local APIC 70 vector 48 Jun 2 20:43:35 xxx kernel: msi: routing MSI-X IRQ 350 to local APIC 72 vector 48 Jun 2 20:43:35 xxx kernel: msi: routing MSI-X IRQ 351 to local APIC 74 vector 48 Jun 2 20:43:35 xxx kernel: msi: routing MSI-X IRQ 352 to local APIC 76 vector 48 Jun 2 20:43:35 xxx kernel: msi: routing MSI-X IRQ 353 to local APIC 78 vector 48 Jun 2 20:43:35 xxx kernel: msi: routing MSI-X IRQ 354 to local APIC 80 vector 48 Jun 2 20:43:35 xxx kernel: msi: routing MSI-X IRQ 355 to local APIC 82 vector 48 Jun 2 20:43:35 xxx kernel: msi: routing MSI-X IRQ 356 to local APIC 84 vector 48 Jun 2 20:43:35 xxx kernel: nvme1: using IRQs 340 Jun 2 20:43:35 xxx kernel: -356 Jun 2 20:43:35 xxx kernel: for MSI-X Jun 2 20:43:35 xxx kernel: nvme1: CapLo: 0x140103ff: MQES 1023, CQR, TO 20 Jun 2 20:43:35 xxx kernel: nvme1: CapHi: 0x00000030: DSTRD 0, NSSRS, CSS 1, MPSMIN 0, MPSMAX 0 Jun 2 20:43:35 xxx kernel: nvme1: Version: 0x00010400: 1.4 Jun 2 20:43:35 xxx kernel: msi: Assigning MSI-X IRQ 341 to local APIC 17 vector 48 Jun 2 20:43:35 xxx kernel: msi: Assigning MSI-X IRQ 342 to local APIC 49 vector 48 Jun 2 20:43:35 xxx kernel: msi: Assigning MSI-X IRQ 343 to local APIC 81 vector 48 Jun 2 20:43:35 xxx kernel: msi: Assigning MSI-X IRQ 344 to local APIC 113 vector 48 Jun 2 20:43:35 xxx kernel: msi: Assigning MSI-X IRQ 345 to local APIC 145 vector 48 Jun 2 20:43:35 xxx kernel: msi: Assigning MSI-X IRQ 346 to local APIC 177 vector 48 Jun 2 20:43:35 xxx kernel: msi: Assigning MSI-X IRQ 347 to local APIC 209 vector 48 Jun 2 20:43:35 xxx kernel: msi: Assigning MSI-X IRQ 348 to local APIC 241 vector 48 Jun 2 20:43:35 xxx kernel: msi: Assigning MSI-X IRQ 349 to local APIC 17 vector 49 Jun 2 20:43:35 xxx kernel: msi: Assigning MSI-X IRQ 350 to local APIC 49 vector 49 Jun 2 20:43:35 xxx kernel: msi: Assigning MSI-X IRQ 351 to local APIC 81 vector 49 Jun 2 20:43:35 xxx kernel: msi: Assigning MSI-X IRQ 352 to local APIC 113 vector 49 Jun 2 20:43:35 xxx kernel: msi: Assigning MSI-X IRQ 353 to local APIC 145 vector 49 Jun 2 20:43:35 xxx kernel: msi: Assigning MSI-X IRQ 354 to local APIC 177 vector 49 Jun 2 20:43:35 xxx kernel: msi: Assigning MSI-X IRQ 355 to local APIC 209 vector 49 Jun 2 20:43:35 xxx kernel: msi: Assigning MSI-X IRQ 356 to local APIC 241 vector 49 Jun 2 20:43:35 xxx kernel: nvme1: Allocated 200MB host memory buffer Jun 2 20:43:35 xxx kernel: nda0 at nvme1 bus 0 scbus16 target 0 lun 1 Jun 2 20:43:35 xxx kernel: nda0: Jun 2 20:43:35 xxx kernel: <WD Blue SN570 500GB 234110WD xxx> Jun 2 20:43:35 xxx kernel: nda0: Serial Number xxx Jun 2 20:43:35 xxx kernel: nda0: nvme version 1.4 x4 (max x4) lanes PCIe Gen3 (max Gen3) link Jun 2 20:43:35 xxx kernel: nda0: 476940MB (976773168 512 byte sectors) Jun 2 20:43:35 xxx kernel: GEOM: new disk nda0 Jun 2 20:43:35 xxx kernel: pass1 at nvme1 bus 0 scbus16 target 0 lun 1 Jun 2 20:43:35 xxx kernel: pass1: <WD Blue SN570 500GB 234110WD 22455X805770> Jun 2 20:43:35 xxx kernel: pass1: Serial Number xxx Jun 2 20:43:35 xxx kernel: pass1: nvme version 1.4 x4 (max x4) lanes PCIe Gen3 (max Gen3) link -- You are receiving this mail because: You are the assignee for the bug.