Hi, I am seeing "frozen PHB" on Power9 bare-metal (PowerNV ppc64le) system leading to non-accessible nvme drives (they are behind a switch) in the 6.15-rc2 kernel (originally with kernel-6.15.0-0.rc2.22.fc43). I was able to bisect the issue to commit 62baf70c327444338c34703c71aa8cc8e4189bd6 [1].
Please see [2] for full console log and other details. Please ignore the "soft-lockup" messages, they are unrelated and going to be resolved with [3]. We are building the kernel with CONFIG_NVME_MULTIPATH=y [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=62baf70c327444338c34703c71aa8cc8e4189bd6 [2] https://fedora.danny.cz/ppc/rdsosreport.txt [3] https://lore.kernel.org/all/20250410125110.1232329-1-gs...@redhat.com/ from the console log ... [ 145.996408] talos.danny.cz kernel: Adaptec aacraid driver 1.2.1[50983]-custom [ 145.996732] talos.danny.cz kernel: aacraid 0002:01:00.0: enabling device (0140 -> 0142) [ 146.010113] talos.danny.cz kernel: nvme nvme0: pci function 0030:0d:00.0 [ 146.010160] talos.danny.cz kernel: nvme 0030:0d:00.0: enabling device (0140 -> 0142) [ 146.010517] talos.danny.cz kernel: nvme nvme1: pci function 0030:0e:00.0 [ 146.010551] talos.danny.cz kernel: nvme 0030:0e:00.0: enabling device (0140 -> 0142) [ 146.017051] talos.danny.cz kernel: nvme nvme0: D3 entry latency set to 8 seconds [ 146.017199] talos.danny.cz kernel: nvme nvme1: D3 entry latency set to 8 seconds [ 146.034341] talos.danny.cz kernel: aacraid: Comm Interface type3 enabled [ 146.041633] talos.danny.cz kernel: usb 1-2.2: New USB device found, idVendor=046d, idProduct=c077, bcdDevice=72.00 [ 146.041659] talos.danny.cz kernel: usb 1-2.2: New USB device strings: Mfr=1, Product=2, SerialNumber=0 [ 146.041682] talos.danny.cz kernel: usb 1-2.2: Product: USB Optical Mouse [ 146.041693] talos.danny.cz kernel: usb 1-2.2: Manufacturer: Logitech [ 146.046181] talos.danny.cz kernel: nvme nvme1: 32/0/0 default/read/poll queues [ 146.048195] talos.danny.cz kernel: nvme nvme0: 32/0/0 default/read/poll queues [ 146.051730] talos.danny.cz kernel: nvme1n1: p1 [ 146.053390] talos.danny.cz kernel: nvme0n1: p1 [ 146.053905] talos.danny.cz systemd[1]: Starting modprobe@configfs.service - Load Kernel Module configfs... [ 146.054112] talos.danny.cz kernel: input: Logitech USB Optical Mouse as /devices/pci0003:00/0003:00:00.0/0003:01:00.0/usb1/1-2/1-2.2/1-2.2:1.0/0003:046D:C077.0002/input/input1 [ 146.054201] talos.danny.cz kernel: AAC0: kernel 3.2-0[0] Apr 24 2017 [ 146.054239] talos.danny.cz kernel: AAC0: monitor 0.0-0[0] [ 146.054261] talos.danny.cz kernel: AAC0: bios 0.13-209[32000] [ 146.054285] talos.danny.cz kernel: AAC0: serial 10F447 [ 146.054307] talos.danny.cz kernel: AAC0: Non-DASD support enabled. [ 146.054331] talos.danny.cz kernel: aacraid 0002:01:00.0: dma_iommu_get_required_mask: returning bypass mask 0xfffffffffffffff [ 146.054361] talos.danny.cz kernel: AAC0: 64bit support enabled. [ 146.054385] talos.danny.cz kernel: aacraid 0002:01:00.0: 64 Bit DAC enabled [ 146.054587] talos.danny.cz kernel: hid-generic 0003:046D:C077.0002: input,hidraw1: USB HID v1.11 Mouse [Logitech USB Optical Mouse] on usb-0003:01:00.0-2.2/input0 [ 146.054701] talos.danny.cz systemd[1]: Finished systemd-udev-trigger.service - Coldplug All udev Devices. [ 146.057858] talos.danny.cz kernel: scsi host8: aacraid [ 146.059886] talos.danny.cz systemd[1]: modprobe@configfs.service: Deactivated successfully. [ 146.060225] talos.danny.cz systemd[1]: Finished modprobe@configfs.service - Load Kernel Module configfs. [ 146.064530] talos.danny.cz kernel: scsi 8:2:0:0: Direct-Access ATA WDC WD5000AAKX-0 1H19 PQ: 0 ANSI: 6 [ 146.083800] talos.danny.cz kernel: sd 8:2:0:0: Attached scsi generic sg3 type 0 [ 146.084962] talos.danny.cz kernel: sd 8:2:0:0: [sdb] 976773168 512-byte logical blocks: (500 GB/466 GiB) [ 146.086117] talos.danny.cz kernel: scsi 8:2:1:0: Direct-Access ATA WDC WD5000AAKS-0 1D05 PQ: 0 ANSI: 6 [ 146.086715] talos.danny.cz kernel: sd 8:2:0:0: [sdb] Write Protect is off [ 146.086735] talos.danny.cz kernel: sd 8:2:0:0: [sdb] Mode Sense: 46 00 10 08 [ 146.088650] talos.danny.cz kernel: sd 8:2:0:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 146.109852] talos.danny.cz kernel: sd 8:2:1:0: Attached scsi generic sg4 type 0 [ 146.110789] talos.danny.cz kernel: scsi 8:3:123:0: Enclosure ADAPTEC Smart Adapter 3.02 PQ: 0 ANSI: 5 [ 146.111055] talos.danny.cz kernel: sd 8:2:1:0: [sdc] 976773168 512-byte logical blocks: (500 GB/466 GiB) [ 146.111521] talos.danny.cz kernel: scsi 8:3:123:0: Attached scsi generic sg5 type 13 [ 146.112133] talos.danny.cz kernel: sd 8:2:1:0: [sdc] Write Protect is off [ 146.112151] talos.danny.cz kernel: sd 8:2:1:0: [sdc] Mode Sense: 46 00 10 08 [ 146.114638] talos.danny.cz kernel: sd 8:2:1:0: [sdc] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 146.127940] talos.danny.cz systemd[1]: Mounting sys-kernel-config.mount - Kernel Configuration File System... [ 146.128927] talos.danny.cz systemd[1]: Starting dracut-initqueue.service - dracut initqueue hook... [ 146.129025] talos.danny.cz systemd[1]: systemd-vconsole-setup.service: Deactivated successfully. [ 146.129101] talos.danny.cz systemd[1]: Stopped systemd-vconsole-setup.service - Virtual Console Setup. [ 146.129272] talos.danny.cz systemd[1]: Stopping systemd-vconsole-setup.service - Virtual Console Setup... [ 146.130235] talos.danny.cz systemd[1]: Starting systemd-vconsole-setup.service - Virtual Console Setup... [ 146.132522] talos.danny.cz systemd[1]: Mounted sys-kernel-config.mount - Kernel Configuration File System. [ 146.166186] talos.danny.cz systemd[1]: Reloading requested from client PID 1225 ('systemctl') (unit dracut-initqueue.service)... [ 146.166230] talos.danny.cz systemd[1]: Reloading... [ 146.253661] talos.danny.cz kernel: usb 1-2.1.3: new low-speed USB device number 7 using xhci_hcd [ 146.289125] talos.danny.cz kernel: sdb: sdb1 sdb2 sdb3 sdb4 [ 146.289537] talos.danny.cz kernel: sd 8:2:0:0: [sdb] Attached SCSI disk [ 146.380101] talos.danny.cz systemd[1]: Reloading finished in 213 ms. [ 147.119656] talos.danny.cz kernel: usb 1-2.1.3: New USB device found, idVendor=0463, idProduct=ffff, bcdDevice= 0.01 [ 147.119687] talos.danny.cz kernel: usb 1-2.1.3: New USB device strings: Mfr=1, Product=2, SerialNumber=0 [ 147.119718] talos.danny.cz kernel: usb 1-2.1.3: Product: 5S [ 147.119730] talos.danny.cz kernel: usb 1-2.1.3: Manufacturer: EATON [ 147.945736] talos.danny.cz kernel: [drm] amdgpu kernel modesetting enabled. [ 147.945864] talos.danny.cz kernel: amdgpu: DSDT table not found for OEM information [ 147.945892] talos.danny.cz kernel: amdgpu: IO link not available for non x86 platforms [ 147.945906] talos.danny.cz kernel: amdgpu: IO link not available for non x86 platforms [ 147.945932] talos.danny.cz kernel: amdgpu: Virtual CRAT table created for CPU [ 147.945979] talos.danny.cz kernel: amdgpu: Topology: Add CPU node [ 147.946287] talos.danny.cz kernel: amdgpu 0000:01:00.0: enabling device (0540 -> 0542) [ 147.946319] talos.danny.cz kernel: [drm] initializing kernel modesetting (POLARIS11 0x1002:0x67E3 0x1002:0x0B0D 0x00). [ 147.946359] talos.danny.cz kernel: [drm] register mmio base: 0x00000000 [ 147.946383] talos.danny.cz kernel: [drm] register mmio size: 262144 [ 147.946579] talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu: detected ip block number 0 <vi_common> [ 147.946599] talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu: detected ip block number 1 <gmc_v8_0> [ 147.946618] talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu: detected ip block number 2 <tonga_ih> [ 147.946637] talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu: detected ip block number 3 <gfx_v8_0> [ 147.946676] talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu: detected ip block number 4 <sdma_v3_0> [ 147.946705] talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu: detected ip block number 5 <powerplay> [ 147.946724] talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu: detected ip block number 6 <dm> [ 147.946740] talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu: detected ip block number 7 <uvd_v6_0> [ 147.946759] talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu: detected ip block number 8 <vce_v3_0> [ 147.966292] talos.danny.cz kernel: sdc: sdc1 sdc2 sdc3 sdc4 [ 147.966699] talos.danny.cz kernel: sd 8:2:1:0: [sdc] Attached SCSI disk [ 148.294421] talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu: Fetched VBIOS from ROM BAR [ 148.294444] talos.danny.cz kernel: amdgpu: ATOM BIOS: 113-D0150600-103 [ 148.295816] talos.danny.cz kernel: [drm] UVD is enabled in VM mode [ 148.295831] talos.danny.cz kernel: [drm] UVD ENC is enabled in VM mode [ 148.295842] talos.danny.cz kernel: [drm] VCE enabled in VM mode [ 148.295852] talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported [ 148.295878] talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu: PCIE atomic ops is not supported [ 148.295893] talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu: PCI CONFIG reset [ 148.296033] talos.danny.cz kernel: [drm] GPU posting now... [ 148.680650] talos.danny.cz systemd[1]: Finished systemd-vconsole-setup.service - Virtual Console Setup. [ 148.767932] talos.danny.cz kernel: [drm] vm size is 256 GB, 2 levels, block size is 10-bit, fragment size is 9-bit [ 148.769652] talos.danny.cz kernel: amdgpu 0000:01:00.0: BAR 2 [mem 0x6000010000000-0x60000101fffff 64bit pref]: releasing [ 148.769677] talos.danny.cz kernel: amdgpu 0000:01:00.0: BAR 0 [mem 0x6000000000000-0x600000fffffff 64bit pref]: releasing [ 148.769709] talos.danny.cz kernel: pci 0000:00:00.0: bridge window [mem 0x6000000000000-0x6003fbff0ffff 64bit pref]: releasing [ 148.769730] talos.danny.cz kernel: pci 0000:00:00.0: bridge window [mem 0x6000000000000-0x600017fffffff 64bit pref]: assigned [ 148.769747] talos.danny.cz kernel: amdgpu 0000:01:00.0: BAR 0 [mem 0x6000000000000-0x60000ffffffff 64bit pref]: assigned [ 148.769779] talos.danny.cz kernel: amdgpu 0000:01:00.0: BAR 2 [mem 0x6000100000000-0x60001001fffff 64bit pref]: assigned [ 148.769802] talos.danny.cz kernel: pci 0000:00:00.0: PCI bridge to [bus 01] [ 148.769815] talos.danny.cz kernel: pci 0000:00:00.0: bridge window [mem 0x600c000000000-0x600c07fefffff] [ 148.769829] talos.danny.cz kernel: pci 0000:00:00.0: bridge window [mem 0x6000000000000-0x6003fbff0ffff 64bit pref] [ 148.769849] talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used) [ 148.769865] talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF [ 148.769882] talos.danny.cz kernel: [drm] Detected VRAM RAM=4096M, BAR=4096M [ 148.769892] talos.danny.cz kernel: [drm] RAM width 128bits GDDR5 [ 148.769901] talos.danny.cz kernel: amdgpu 0000:01:00.0: dma_iommu_get_required_mask: returning bypass mask 0xfffffffffffffff [ 148.770903] talos.danny.cz kernel: hid-generic 0003:0463:FFFF.0003: hiddev96,hidraw2: USB HID v1.10 Device [EATON 5S] on usb-0003:01:00.0-2.1.3/input0 [ 148.771257] talos.danny.cz kernel: [drm] amdgpu: 4096M of VRAM memory ready [ 148.771279] talos.danny.cz kernel: [drm] amdgpu: 32600M of GTT memory ready. [ 148.771352] talos.danny.cz kernel: [drm] GART: num cpu pages 4096, num gpu pages 65536 [ 148.772018] talos.danny.cz kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F4FFF80000). [ 148.774684] talos.danny.cz kernel: [drm] Chained IB support enabled! [ 148.787973] talos.danny.cz kernel: amdgpu: hwmgr_sw_init smu backed is polaris10_smu [ 148.813569] talos.danny.cz kernel: [drm] Found UVD firmware Version: 1.130 Family ID: 16 [ 148.826195] talos.danny.cz kernel: [drm] Found VCE firmware Version: 53.26 Binary ID: 3 [ 148.842658] talos.danny.cz kernel: [drm] Display Core v3.2.325 initialized on DCE 11.2 [ 148.843782] talos.danny.cz systemd[1]: Starting plymouth-start.service - Show Plymouth Boot Screen... [ 148.846228] talos.danny.cz systemd[1]: Received SIGRTMIN+20 from PID 1344 (plymouthd). [ 148.998163] talos.danny.cz kernel: usb 1-2.1.4: new full-speed USB device number 8 using xhci_hcd [ 148.998776] talos.danny.cz kernel: [drm] UVD and UVD ENC initialized successfully. [ 149.099708] talos.danny.cz kernel: [drm] VCE initialized successfully. [ 149.107051] talos.danny.cz kernel: kfd kfd: amdgpu: skipped device 1002:67e3, PCI rejects atomics 730<0 [ 149.107085] talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu: SE 2, SH per SE 1, CU per SH 8, active_cu_number 16 [ 149.110945] talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu: Using BACO for runtime pm [ 149.111687] talos.danny.cz kernel: [drm] Initialized amdgpu 3.63.0 for 0000:01:00.0 on minor 0 [ 149.129106] talos.danny.cz systemd[1]: systemd-vconsole-setup.service: Deactivated successfully. [ 149.129315] talos.danny.cz systemd[1]: Stopped systemd-vconsole-setup.service - Virtual Console Setup. [ 149.146843] talos.danny.cz kernel: Console: switching to colour frame buffer device 240x75 [ 149.160686] talos.danny.cz kernel: md/raid1:md127: active with 2 out of 2 mirrors [ 149.172981] talos.danny.cz kernel: md127: detected capacity change from 0 to 940836864 [ 149.173957] talos.danny.cz kernel: amdgpu 0000:01:00.0: [drm] fb0: amdgpudrmfb frame buffer device [ 149.174099] talos.danny.cz systemd[1]: Stopping systemd-vconsole-setup.service - Virtual Console Setup... [ 149.278744] talos.danny.cz kernel: md/raid1:md126: active with 2 out of 2 mirrors [ 149.292981] talos.danny.cz kernel: md126: detected capacity change from 0 to 2095104 [ 149.331567] talos.danny.cz kernel: usb 1-2.1.4: New USB device found, idVendor=0403, idProduct=6015, bcdDevice=10.00 [ 149.331574] talos.danny.cz kernel: usb 1-2.1.4: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 149.331579] talos.danny.cz kernel: usb 1-2.1.4: Product: FT230X Basic UART [ 149.331582] talos.danny.cz kernel: usb 1-2.1.4: Manufacturer: FTDI [ 149.331584] talos.danny.cz kernel: usb 1-2.1.4: SerialNumber: DM02XLCC [ 149.353523] talos.danny.cz systemd-journald[629]: Journal started [ 149.353582] talos.danny.cz systemd-journald[629]: Runtime Journal (/run/log/journal/d94ac98ea91043d3892dab218d99209d) is 8.0M, max 1.2G, 1.2G free. [ 149.360328] talos.danny.cz systemd-vconsole-setup[632]: /usr/bin/setfont failed with a "system error" (EX_OSERR), ignoring. [ 149.360817] talos.danny.cz systemd-modules-load[631]: Inserted module 'fuse' [ 149.361170] talos.danny.cz systemd-modules-load[631]: Inserted module 'i2c_dev' [ 149.361370] talos.danny.cz systemd-modules-load[631]: Inserted module 'ip_tables' [ 149.361419] talos.danny.cz systemd-vconsole-setup[640]: setfont: ERROR kdfontop.c:183 put_font_kdfontop: Unable to load such font with such kernel version [ 149.361697] talos.danny.cz systemd-modules-load[631]: Inserted module 'ip6_tables' [ 149.361983] talos.danny.cz systemd-modules-load[631]: Failed to find module 'scsi_dh_alua' [ 149.362176] talos.danny.cz systemd-modules-load[631]: Failed to find module 'scsi_dh_emc' [ 149.362453] talos.danny.cz systemd-modules-load[631]: Failed to find module 'scsi_dh_rdac' [ 149.362539] talos.danny.cz dracut-cmdline[659]: dracut-102-2.fc40 [ 149.362539] talos.danny.cz dracut-cmdline[659]: Using kernel command line parameters: root=/dev/mapper/Linux-Root ro rd.lvm.lv=Linux/Root rd.md.uuid=60936c65:08d9f6bc:b191c895:332a4d53 rd.md.uuid=06128381:0df3ab4b:02ebd84d:84921066 rd.md.uuid=3c52d341:6485ed32:9da81f4c:706b231f console=tty1 console=hvc0 [ 149.362878] talos.danny.cz systemd-sysusers[650]: Creating group 'nobody' with GID 65534. [ 149.363123] talos.danny.cz systemd-sysusers[650]: Creating group 'users' with GID 100. [ 149.363316] talos.danny.cz systemd-sysusers[650]: Creating group 'systemd-journal' with GID 190. [ 149.363503] talos.danny.cz systemd-vconsole-setup[632]: Setting source virtual console failed, ignoring remaining ones. [ 149.363694] talos.danny.cz systemd-udevd[753]: Using default interface naming scheme 'v255'. [ 149.366236] talos.danny.cz systemd-vconsole-setup[1224]: setfont: ERROR kdfontop.c:183 put_font_kdfontop: Unable to load such font with such kernel version [ 149.414053] talos.danny.cz systemd[1]: Starting systemd-vconsole-setup.service - Virtual Console Setup... [ 149.415317] talos.danny.cz systemd[1]: Started systemd-journald.service - Journal Service. [ 149.384617] talos.danny.cz systemd-vconsole-setup[1199]: /usr/bin/setfont failed with a "system error" (EX_OSERR), ignoring. [ 149.384730] talos.danny.cz systemd-vconsole-setup[1199]: Setting source virtual console failed, ignoring remaining ones. [ 149.419101] talos.danny.cz systemd[1]: Starting systemd-tmpfiles-setup.service - Create System Files and Directories... [ 149.449593] talos.danny.cz systemd-tmpfiles[1510]: /usr/lib/tmpfiles.d/var.conf:14: Duplicate line for path "/var/log", ignoring. [ 149.464643] talos.danny.cz systemd[1]: Finished systemd-tmpfiles-setup.service - Create System Files and Directories. [ 149.774957] talos.danny.cz systemd[1]: Started plymouth-start.service - Show Plymouth Boot Screen. [ 149.775823] talos.danny.cz systemd[1]: systemd-ask-password-console.path - Dispatch Password Requests to Console Directory Watch was skipped because of an unmet condition check (ConditionPathExists=!/run/plymouth/pid). [ 149.776071] talos.danny.cz systemd[1]: Started systemd-ask-password-plymouth.path - Forward Password Requests to Plymouth Directory Watch. [ 149.776506] talos.danny.cz systemd[1]: Reached target paths.target - Path Units. [ 149.860914] talos.danny.cz systemd[1]: Finished systemd-vconsole-setup.service - Virtual Console Setup. [ 149.861277] talos.danny.cz systemd[1]: Reached target sysinit.target - System Initialization. [ 149.862051] talos.danny.cz systemd[1]: Reached target basic.target - Basic System. [ 149.950478] talos.danny.cz systemd[1]: Started rngd.service - Hardware RNG Entropy Gatherer Daemon. [ 149.957295] talos.danny.cz rngd[1525]: Disabling 7: PKCS11 Entropy generator (pkcs11) [ 149.957295] talos.danny.cz rngd[1525]: Disabling 5: NIST Network Entropy Beacon (nist) [ 149.957295] talos.danny.cz rngd[1525]: Initializing available sources [ 149.957880] talos.danny.cz rngd[1525]: [hwrng ]: Initialization Failed [ 149.960647] talos.danny.cz rngd[1525]: [darn ]: Enabling power DARN rng support [ 149.960647] talos.danny.cz rngd[1525]: [darn ]: Initialized [ 149.960647] talos.danny.cz rngd[1525]: [jitter]: JITTER timeout set to 5 sec [ 150.069593] talos.danny.cz rngd[1525]: [jitter]: Initializing AES buffer [ 155.000031] talos.danny.cz rngd[1525]: [jitter]: Unable to obtain AES key, disabling JITTER source [ 155.000886] talos.danny.cz rngd[1525]: [jitter]: Initialization Failed [ 155.008927] talos.danny.cz rngd[1525]: [rtlsdr]: Initialization Failed [ 155.008927] talos.danny.cz rngd[1525]: [namedpipe]: Initialization Failed [ 159.883963] talos.danny.cz kernel: pci 0032:00:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller@0 [ 159.884630] talos.danny.cz kernel: pci 0033:00:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller@0 [ 159.885271] talos.danny.cz kernel: pci 0000:00:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller@0 [ 159.885892] talos.danny.cz kernel: pci 0001:00:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller@0 [ 159.886511] talos.danny.cz kernel: pci 0002:00:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller@0 [ 159.887129] talos.danny.cz kernel: pci 0003:00:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller@0 [ 159.887742] talos.danny.cz kernel: pci 0004:00:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller@0 [ 159.888355] talos.danny.cz kernel: pci 0005:00:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller@0 [ 159.888958] talos.danny.cz kernel: pci 0005:01:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller@0 [ 159.889557] talos.danny.cz kernel: pci 0030:00:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller@0 [ 159.890176] talos.danny.cz kernel: pci 0030:01:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller@0 [ 159.890762] talos.danny.cz kernel: pci 0030:02:04.0: deferred probe pending: pci: wait for supplier /interrupt-controller@0 [ 159.891344] talos.danny.cz kernel: pci 0030:02:05.0: deferred probe pending: pci: wait for supplier /interrupt-controller@0 [ 159.891946] talos.danny.cz kernel: pci 0030:02:06.0: deferred probe pending: pci: wait for supplier /interrupt-controller@0 [ 159.892538] talos.danny.cz kernel: pci 0030:02:07.0: deferred probe pending: pci: wait for supplier /interrupt-controller@0 [ 159.893121] talos.danny.cz kernel: pci 0031:00:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller@0 [ 176.682101] talos.danny.cz kernel: EEH: Recovering PHB#30-PE#1fb [ 176.682114] talos.danny.cz kernel: EEH: PE location: N/A, PHB location: N/A [ 176.682116] talos.danny.cz kernel: EEH: Frozen PHB#30-PE#1fb detected [ 176.682118] talos.danny.cz kernel: EEH: Call Trace: [ 176.682119] talos.danny.cz kernel: EEH: [0000000043837525] __eeh_send_failure_event+0x78/0x150 [ 176.682129] talos.danny.cz kernel: EEH: [00000000ca9d8fff] eeh_dev_check_failure+0x2e0/0x6d0 [ 176.682136] talos.danny.cz kernel: EEH: [000000000896f909] nvme_timeout+0x264/0x670 [nvme] [ 176.682145] talos.danny.cz kernel: EEH: [000000002faff0a8] blk_mq_handle_expired+0xb0/0x130 [ 176.682151] talos.danny.cz kernel: EEH: [00000000014f13c7] bt_iter+0xf8/0x140 [ 176.682156] talos.danny.cz kernel: EEH: [0000000072f9f2ba] blk_mq_queue_tag_busy_iter+0x384/0x680 [ 176.682160] talos.danny.cz kernel: EEH: [00000000449486be] blk_mq_timeout_work+0x198/0x1f0 [ 176.682164] talos.danny.cz kernel: EEH: [00000000a8845314] process_one_work+0x1f8/0x510 [ 176.682170] talos.danny.cz kernel: EEH: [00000000a19f763f] worker_thread+0x33c/0x510 [ 176.682173] talos.danny.cz kernel: EEH: [00000000786c4402] kthread+0x150/0x160 [ 176.682178] talos.danny.cz kernel: EEH: [00000000393b9885] start_kernel_thread+0x14/0x18 [ 176.682182] talos.danny.cz kernel: EEH: This PCI device has failed 1 times in the last hour and will be permanently disabled after 5 failures. [ 176.682185] talos.danny.cz kernel: EEH: Notify device drivers to shutdown [ 176.682187] talos.danny.cz kernel: EEH: Beginning: 'error_detected(IO frozen)' [ 176.682191] talos.danny.cz kernel: PCI 0030:0d:00.0#01fb: EEH: Invoking nvme->error_detected(IO frozen) [ 176.682198] talos.danny.cz kernel: nvme nvme0: frozen state error detected, reset controller [ 176.934877] talos.danny.cz kernel: nvme0n1: I/O Cmd(0x2) @ LBA 1875384832, 128 blocks, I/O Error (sct 0x3 / sc 0x71) [ 176.934893] talos.danny.cz kernel: I/O error, dev nvme0n1, sector 1875384832 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 [ 176.954198] talos.danny.cz kernel: nvme nvme0: Failed to get ANA log: -4 [ 176.973671] talos.danny.cz kernel: PCI 0030:0d:00.0#01fb: EEH: nvme driver reports: 'need reset' [ 176.973677] talos.danny.cz kernel: EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'need reset' [ 176.973683] talos.danny.cz kernel: EEH: Collect temporary log [ 176.973712] talos.danny.cz kernel: EEH: of node=0030:0d:00.0 [ 176.973716] talos.danny.cz kernel: EEH: PCI device/vendor: a808144d [ 176.973719] talos.danny.cz kernel: EEH: PCI cmd/status register: 00100142 [ 176.973721] talos.danny.cz kernel: EEH: PCI-E capabilities and status follow: [ 176.973732] talos.danny.cz kernel: EEH: PCI-E 00: 0002b010 10648fc1 00002830 00437043 [ 176.973742] talos.danny.cz kernel: EEH: PCI-E 10: 10430000 00000000 00000000 00000000 [ 176.977408] talos.danny.cz kernel: EEH: PCI-E 20: 00000000 [ 176.977409] talos.danny.cz kernel: EEH: PCI-E AER capability register set follows: [ 176.977421] talos.danny.cz kernel: EEH: PCI-E AER 00: 14820001 00000000 00400000 00462030 [ 176.978606] talos.danny.cz kernel: EEH: PCI-E AER 10: 00000000 0000e000 000003e0 00000000 [ 176.978615] talos.danny.cz kernel: EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000 [ 176.978618] talos.danny.cz kernel: EEH: PCI-E AER 30: 00000000 00000000 [ 176.979843] talos.danny.cz kernel: PHB4 PHB#48 Diag-data (Version: 1) [ 176.979845] talos.danny.cz kernel: brdgCtl: 00000002 [ 176.979847] talos.danny.cz kernel: RootSts: 00060020 00402000 a0830008 00100107 00000800 [ 176.979849] talos.danny.cz kernel: PhbSts: 0000001c00000000 0000001c00000000 [ 176.981515] talos.danny.cz kernel: Lem: 0000000100000080 0000000000000000 0000000000000080 [ 176.981517] talos.danny.cz kernel: PhbErr: 0000028000000000 0000020000000000 2148000098000240 a008400000000000 [ 176.981520] talos.danny.cz kernel: RxeTceErr: 2000000000000000 2000000000000000 c0000000000001fa 0000000000000000 [ 176.981522] talos.danny.cz kernel: PblErr: 0000000000020000 0000000000020000 0000000000000000 0000000000000000 [ 176.981524] talos.danny.cz kernel: RegbErr: 0000004000000000 0000004000000000 8800000c00000000 0000000007011000 [ 176.981529] talos.danny.cz kernel: PE[1fa] A/B: 8300b03800000000 8000000000000000 [ 176.981532] talos.danny.cz kernel: PE[..1fb] A/B: as above [ 176.981533] talos.danny.cz kernel: EEH: Reset without hotplug activity [ 180.403658] talos.danny.cz kernel: nvme 0030:0d:00.0: not ready 1023ms after bus reset; waiting [ 181.483654] talos.danny.cz kernel: nvme 0030:0d:00.0: not ready 2047ms after bus reset; waiting [ 183.563680] talos.danny.cz kernel: nvme 0030:0d:00.0: not ready 4095ms after bus reset; waiting [ 187.723654] talos.danny.cz kernel: nvme 0030:0d:00.0: not ready 8191ms after bus reset; waiting [ 196.364131] talos.danny.cz kernel: nvme 0030:0d:00.0: not ready 16383ms after bus reset; waiting [ 213.004161] talos.danny.cz kernel: nvme 0030:0d:00.0: not ready 32767ms after bus reset; waiting [ 246.283652] talos.danny.cz kernel: nvme 0030:0d:00.0: not ready 65535ms after bus reset; giving up [ 246.235672] talos.danny.cz systemd-udevd[753]: nvme0n1: Worker [958] processing SEQNUM=1898 is taking a long time [ 246.236577] talos.danny.cz systemd-udevd[753]: nvme1n1: Worker [921] processing SEQNUM=1893 is taking a long time [ 246.552477] talos.danny.cz kernel: EEH: Beginning: 'slot_reset' [ 246.553820] talos.danny.cz kernel: PCI 0030:0d:00.0#01fb: EEH: Invoking nvme->slot_reset() [ 246.553838] talos.danny.cz kernel: nvme nvme0: restart after slot reset [ 246.629560] talos.danny.cz kernel: PCI 0030:0d:00.0#01fb: EEH: nvme driver reports: 'recovered' [ 246.629565] talos.danny.cz kernel: EEH: Finished:'slot_reset' with aggregate recovery state:'recovered' [ 246.630292] talos.danny.cz kernel: EEH: Notify device driver to resume [ 246.630659] talos.danny.cz kernel: EEH: Beginning: 'resume' [ 246.633097] talos.danny.cz kernel: PCI 0030:0d:00.0#01fb: EEH: Invoking nvme->resume() [ 246.683699] talos.danny.cz kernel: nvme 0030:0d:00.0: Unable to change power state from D3cold to D0, device inaccessible [ 246.684963] talos.danny.cz kernel: nvme nvme0: Disabling device after reset failure: -19 [ 246.744271] talos.danny.cz kernel: Buffer I/O error on dev nvme0n1, logical block 117211552, async page read [ 246.745010] talos.danny.cz kernel: (udev-worker): attempt to access beyond end of device nvme0n1: rw=0, sector=1875384848, nr_sectors = 16 limit=0 [ 246.746281] talos.danny.cz kernel: Buffer I/O error on dev nvme0n1, logical block 117211553, async page read [ 246.746929] talos.danny.cz kernel: (udev-worker): attempt to access beyond end of device nvme0n1: rw=0, sector=1875384864, nr_sectors = 16 limit=0 [ 246.748195] talos.danny.cz kernel: Buffer I/O error on dev nvme0n1, logical block 117211554, async page read [ 246.748838] talos.danny.cz kernel: (udev-worker): attempt to access beyond end of device nvme0n1: rw=0, sector=1875384880, nr_sectors = 16 limit=0 [ 246.750125] talos.danny.cz kernel: Buffer I/O error on dev nvme0n1, logical block 117211555, async page read [ 246.750778] talos.danny.cz kernel: (udev-worker): attempt to access beyond end of device nvme0n1: rw=0, sector=1875384896, nr_sectors = 16 limit=0 [ 246.752076] talos.danny.cz kernel: Buffer I/O error on dev nvme0n1, logical block 117211556, async page read [ 246.752725] talos.danny.cz kernel: (udev-worker): attempt to access beyond end of device nvme0n1: rw=0, sector=1875384912, nr_sectors = 16 limit=0 [ 246.754025] talos.danny.cz kernel: Buffer I/O error on dev nvme0n1, logical block 117211557, async page read [ 246.754792] talos.danny.cz kernel: PCI 0030:0d:00.0#01fb: EEH: nvme driver reports: 'none' [ 246.754800] talos.danny.cz kernel: EEH: Finished:'resume' [ 246.754804] talos.danny.cz kernel: EEH: Recovery successful. [ 246.754809] talos.danny.cz kernel: EEH: Recovering PHB#30-PE#1fb [ 246.754820] talos.danny.cz kernel: EEH: PE location: N/A, PHB location: N/A [ 246.754823] talos.danny.cz kernel: EEH: Frozen PHB#30-PE#1fb detected [ 246.754827] talos.danny.cz kernel: EEH: Call Trace: [ 246.754829] talos.danny.cz kernel: EEH: [0000000043837525] __eeh_send_failure_event+0x78/0x150 [ 246.754842] talos.danny.cz kernel: EEH: [00000000ca9d8fff] eeh_dev_check_failure+0x2e0/0x6d0 [ 246.754853] talos.danny.cz kernel: EEH: [000000007534829c] pnv_pci_read_config+0x148/0x180 [ 246.754859] talos.danny.cz kernel: EEH: [000000006db2aa5c] pci_bus_read_config_dword+0x90/0xf0 [ 246.754866] talos.danny.cz kernel: EEH: [00000000ed12774a] pci_find_next_ext_capability+0x5c/0x150 [ 246.754874] talos.danny.cz kernel: EEH: [000000003a1ec347] pci_restore_ltr_state+0x40/0xa0 [ 246.754882] talos.danny.cz kernel: EEH: [000000000c3c04be] pci_restore_state.part.0+0x2c/0x3b0 [ 246.754888] talos.danny.cz kernel: EEH: [000000005a69e2f4] nvme_slot_reset+0x48/0x90 [nvme] [ 246.754899] talos.danny.cz kernel: EEH: [00000000efdedb77] eeh_report_reset+0xd0/0x100 [ 246.754905] talos.danny.cz kernel: EEH: [00000000bdb52d8d] eeh_pe_report+0x2bc/0x548 [ 246.754911] talos.danny.cz kernel: EEH: [0000000039694420] eeh_handle_normal_event+0x89c/0x9c0 [ 246.754918] talos.danny.cz kernel: EEH: [00000000bbf75c7c] eeh_event_handler+0xfc/0x170 [ 246.754924] talos.danny.cz kernel: EEH: [00000000786c4402] kthread+0x150/0x160 [ 246.754932] talos.danny.cz kernel: EEH: [00000000393b9885] start_kernel_thread+0x14/0x18 [ 246.754939] talos.danny.cz kernel: EEH: This PCI device has failed 2 times in the last hour and will be permanently disabled after 5 failures. [ 246.754942] talos.danny.cz kernel: EEH: Notify device drivers to shutdown [ 246.754945] talos.danny.cz kernel: EEH: Beginning: 'error_detected(IO frozen)' [ 246.754949] talos.danny.cz kernel: PCI 0030:0d:00.0#01fb: EEH: Invoking nvme->error_detected(IO frozen) [ 246.754956] talos.danny.cz kernel: nvme nvme0: frozen state error detected, reset controller [ 246.756867] talos.danny.cz kernel: (udev-worker): attempt to access beyond end of device nvme0n1: rw=0, sector=1875384928, nr_sectors = 16 limit=0 [ 246.775171] talos.danny.cz kernel: Buffer I/O error on dev nvme0n1, logical block 117211558, async page read [ 246.775177] talos.danny.cz kernel: (udev-worker): attempt to access beyond end of device nvme0n1: rw=0, sector=1875384944, nr_sectors = 16 limit=0 [ 246.775181] talos.danny.cz kernel: Buffer I/O error on dev nvme0n1, logical block 117211559, async page read [ 246.778998] talos.danny.cz kernel: Buffer I/O error on dev nvme0n1p1, logical block 8191999, async page read [ 246.853703] talos.danny.cz kernel: PCI 0030:0d:00.0#01fb: EEH: nvme driver reports: 'disconnect' [ 246.853709] talos.danny.cz kernel: EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'disconnect' [ 246.855030] talos.danny.cz kernel: EEH: Unable to recover from failure from PHB#30-PE#1fb. Please try reseating or replacing it [ 246.856319] talos.danny.cz kernel: EEH: of node=0030:0d:00.0 [ 246.856938] talos.danny.cz kernel: EEH: PCI device/vendor: ffffffff [ 246.857564] talos.danny.cz kernel: EEH: PCI cmd/status register: ffffffff [ 246.858192] talos.danny.cz kernel: EEH: PCI-E capabilities and status follow: [ 246.858829] talos.danny.cz kernel: EEH: PCI-E 00: ffffffff ffffffff ffffffff ffffffff [ 246.859465] talos.danny.cz kernel: EEH: PCI-E 10: ffffffff ffffffff ffffffff ffffffff [ 246.860081] talos.danny.cz kernel: EEH: PCI-E 20: ffffffff [ 246.860727] talos.danny.cz kernel: EEH: PCI-E AER capability register set follows: [ 246.861354] talos.danny.cz kernel: EEH: PCI-E AER 00: ffffffff ffffffff ffffffff ffffffff [ 246.862027] talos.danny.cz kernel: EEH: PCI-E AER 10: ffffffff ffffffff ffffffff ffffffff [ 246.862849] talos.danny.cz kernel: EEH: PCI-E AER 20: ffffffff ffffffff ffffffff ffffffff [ 246.863450] talos.danny.cz kernel: EEH: PCI-E AER 30: ffffffff ffffffff [ 246.864068] talos.danny.cz kernel: PHB4 PHB#48 Diag-data (Version: 1) [ 246.864682] talos.danny.cz kernel: brdgCtl: 00000002 [ 246.864685] talos.danny.cz kernel: RootSts: 00000020 00402000 a0830008 00100107 00002000 [ 246.864689] talos.danny.cz kernel: PhbSts: 0000001c00000000 0000001c00000000 [ 246.864692] talos.danny.cz kernel: Lem: 0000000100280000 0000000000000000 0000000100000000 [ 246.864696] talos.danny.cz kernel: PhbErr: 0000088000000000 0000008000000000 2148000098000240 a008400000000000 [ 246.864700] talos.danny.cz kernel: RxeArbErr: 4000200000000000 0000200000000000 02409fde30000000 0000000000000000 [ 246.864703] talos.danny.cz kernel: PblErr: 0000000001000000 0000000001000000 0000000000000000 0000000000000000 [ 246.864707] talos.danny.cz kernel: RegbErr: 0000004000000000 0000004000000000 61000c4800000000 0000000000000000 [ 246.864714] talos.danny.cz kernel: PE[1fa] A/B: 8300b03800000000 8000000000000000 [ 246.870066] talos.danny.cz kernel: PE[1fb] A/B: b740002a02300000 8000000000000000 [ 246.870071] talos.danny.cz kernel: EEH: Beginning: 'error_detected(permanent failure)' [ 246.873646] talos.danny.cz kernel: PCI 0030:0d:00.0#01fb: EEH: Invoking nvme->error_detected(permanent failure) [ 246.873656] talos.danny.cz kernel: nvme nvme0: failure state error detected, request disconnect [ 246.873701] talos.danny.cz kernel: PCI 0030:0d:00.0#01fb: EEH: nvme driver reports: 'disconnect' [ 246.873708] talos.danny.cz kernel: EEH: Finished:'error_detected(permanent failure)' [ 247.204789] talos.danny.cz kernel: pci 0030:0d : [PE# 1fb] Releasing PE [ 247.205905] talos.danny.cz kernel: pci 0030:0d : [PE# 1fb] Removing DMA window #0 [ 247.206466] talos.danny.cz kernel: pci 0030:0d : [PE# 1fb] Disabling 64-bit DMA bypass [ 247.211359] talos.danny.cz kernel: EEH: Recovering PHB#30-PE#1fa [ 247.211909] talos.danny.cz kernel: EEH: PE location: N/A, PHB location: N/A [ 247.212456] talos.danny.cz kernel: EEH: Frozen PHB#30-PE#1fa detected [ 247.213000] talos.danny.cz kernel: EEH: Call Trace: [ 247.213792] talos.danny.cz kernel: EEH: [0000000043837525] __eeh_send_failure_event+0x78/0x150 [ 247.214363] talos.danny.cz kernel: EEH: [00000000ca9d8fff] eeh_dev_check_failure+0x2e0/0x6d0 [ 247.214373] talos.danny.cz kernel: EEH: [000000000896f909] nvme_timeout+0x264/0x670 [nvme] [ 247.214382] talos.danny.cz kernel: EEH: [000000002faff0a8] blk_mq_handle_expired+0xb0/0x130 [ 247.214389] talos.danny.cz kernel: EEH: [00000000014f13c7] bt_iter+0xf8/0x140 [ 247.216598] talos.danny.cz kernel: EEH: [0000000072f9f2ba] blk_mq_queue_tag_busy_iter+0x384/0x680 [ 247.216605] talos.danny.cz kernel: EEH: [00000000449486be] blk_mq_timeout_work+0x198/0x1f0 [ 247.216610] talos.danny.cz kernel: EEH: [00000000a8845314] process_one_work+0x1f8/0x510 [ 247.218312] talos.danny.cz kernel: EEH: [00000000a19f763f] worker_thread+0x33c/0x510 [ 247.218318] talos.danny.cz kernel: EEH: [00000000786c4402] kthread+0x150/0x160 [ 247.218325] talos.danny.cz kernel: EEH: [00000000393b9885] start_kernel_thread+0x14/0x18 [ 247.219962] talos.danny.cz kernel: EEH: This PCI device has failed 1 times in the last hour and will be permanently disabled after 5 failures. [ 247.219965] talos.danny.cz kernel: EEH: Notify device drivers to shutdown [ 247.219968] talos.danny.cz kernel: EEH: Beginning: 'error_detected(IO frozen)' [ 247.223662] talos.danny.cz kernel: PCI 0030:0e:00.0#01fa: EEH: Invoking nvme->error_detected(IO frozen) [ 247.224563] talos.danny.cz kernel: nvme nvme1: frozen state error detected, reset controller [ 247.455003] talos.danny.cz kernel: nvme1n1: I/O Cmd(0x2) @ LBA 1875384832, 128 blocks, I/O Error (sct 0x3 / sc 0x71) [ 247.455611] talos.danny.cz kernel: I/O error, dev nvme1n1, sector 1875384832 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 [ 247.484222] talos.danny.cz kernel: nvme nvme1: Failed to get ANA log: -4 [ 247.513681] talos.danny.cz kernel: PCI 0030:0e:00.0#01fa: EEH: nvme driver reports: 'need reset' [ 247.513686] talos.danny.cz kernel: EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'need reset' [ 247.515019] talos.danny.cz kernel: EEH: Collect temporary log [ 247.515659] talos.danny.cz kernel: EEH: of node=0030:0e:00.0 [ 247.516286] talos.danny.cz kernel: EEH: PCI device/vendor: a808144d [ 247.516920] talos.danny.cz kernel: EEH: PCI cmd/status register: 00100142 [ 247.517567] talos.danny.cz kernel: EEH: PCI-E capabilities and status follow: [ 247.518224] talos.danny.cz kernel: EEH: PCI-E 00: 0002b010 10648fc1 00002830 00437043 [ 247.518883] talos.danny.cz kernel: EEH: PCI-E 10: 10430000 00000000 00000000 00000000 [ 247.519526] talos.danny.cz kernel: EEH: PCI-E 20: 00000000 [ 247.520166] talos.danny.cz kernel: EEH: PCI-E AER capability register set follows: [ 247.520810] talos.danny.cz kernel: EEH: PCI-E AER 00: 14820001 00000000 00400000 00462030 [ 247.521440] talos.danny.cz kernel: EEH: PCI-E AER 10: 00000000 0000e000 000003e0 00000000 [ 247.522068] talos.danny.cz kernel: EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000 [ 247.522692] talos.danny.cz kernel: EEH: PCI-E AER 30: 00000000 00000000 [ 247.523340] talos.danny.cz kernel: PHB4 PHB#48 Diag-data (Version: 1) [ 247.523977] talos.danny.cz kernel: brdgCtl: 00000002 [ 247.524824] talos.danny.cz kernel: RootSts: 00060020 00402000 a0830008 00100107 00000800 [ 247.524828] talos.danny.cz kernel: PhbSts: 0000001c00000000 0000001c00000000 [ 247.524830] talos.danny.cz kernel: Lem: 0000000100000080 0000000000000000 0000000000000080 [ 247.524833] talos.danny.cz kernel: PhbErr: 0000028000000000 0000020000000000 2148000098000240 a008400000000000 [ 247.524836] talos.danny.cz kernel: RxeTceErr: 2000000000000000 2000000000000000 c0000000000001fa 0000000000000000 [ 247.524838] talos.danny.cz kernel: PblErr: 0000000000020000 0000000000020000 0000000000000000 0000000000000000 [ 247.524841] talos.danny.cz kernel: RegbErr: 0000004000000000 0000004000000000 8800000c00000000 0000000007011000 [ 247.524846] talos.danny.cz kernel: PE[1fa] A/B: 8300b03800000000 8000000000000000 [ 247.524849] talos.danny.cz kernel: PE[..1fb] A/B: as above [ 247.524851] talos.danny.cz kernel: EEH: Reset without hotplug activity [ 250.964186] talos.danny.cz kernel: nvme 0030:0e:00.0: not ready 1023ms after bus reset; waiting [ 252.044182] talos.danny.cz kernel: nvme 0030:0e:00.0: not ready 2047ms after bus reset; waiting [ 254.124189] talos.danny.cz kernel: nvme 0030:0e:00.0: not ready 4095ms after bus reset; waiting [ 258.284197] talos.danny.cz kernel: nvme 0030:0e:00.0: not ready 8191ms after bus reset; waiting [ 266.764192] talos.danny.cz kernel: nvme 0030:0e:00.0: not ready 16383ms after bus reset; waiting [ 283.404208] talos.danny.cz kernel: nvme 0030:0e:00.0: not ready 32767ms after bus reset; waiting [ 316.684204] talos.danny.cz kernel: nvme 0030:0e:00.0: not ready 65535ms after bus reset; giving up [ 316.638265] talos.danny.cz dracut-initqueue[1340]: Timed out for waiting the udev queue being empty. [ 316.952931] talos.danny.cz kernel: EEH: Beginning: 'slot_reset' [ 316.953821] talos.danny.cz kernel: PCI 0030:0e:00.0#01fa: EEH: Invoking nvme->slot_reset() [ 316.953841] talos.danny.cz kernel: nvme nvme1: restart after slot reset [ 317.026731] talos.danny.cz kernel: PCI 0030:0e:00.0#01fa: EEH: nvme driver reports: 'recovered' [ 317.026736] talos.danny.cz kernel: EEH: Finished:'slot_reset' with aggregate recovery state:'recovered' [ 317.027508] talos.danny.cz kernel: EEH: Notify device driver to resume [ 317.027891] talos.danny.cz kernel: EEH: Beginning: 'resume' [ 317.030352] talos.danny.cz kernel: PCI 0030:0e:00.0#01fa: EEH: Invoking nvme->resume() [ 317.083703] talos.danny.cz kernel: nvme 0030:0e:00.0: Unable to change power state from D3cold to D0, device inaccessible [ 317.085001] talos.danny.cz kernel: nvme nvme1: Disabling device after reset failure: -19 [ 317.144264] talos.danny.cz kernel: Buffer I/O error on dev nvme1n1, logical block 117211552, async page read [ 317.144465] talos.danny.cz kernel: PCI 0030:0e:00.0#01fa: EEH: nvme driver reports: 'none' [ 317.144957] talos.danny.cz kernel: EEH: Finished:'resume' [ 317.144965] talos.danny.cz kernel: (udev-worker): attempt to access beyond end of device nvme1n1: rw=0, sector=1875384848, nr_sectors = 16 limit=0 [ 317.144980] talos.danny.cz kernel: EEH: Recovery successful. [ 317.144989] talos.danny.cz kernel: Buffer I/O error on dev nvme1n1, logical block 117211553, async page read [ 317.145018] talos.danny.cz kernel: EEH: Recovering PHB#30-PE#1fa [ 317.145025] talos.danny.cz kernel: (udev-worker): attempt to access beyond end of device nvme1n1: rw=0, sector=1875384864, nr_sectors = 16 limit=0 [ 317.145053] talos.danny.cz kernel: EEH: PE location: N/A, PHB location: N/A [ 317.145056] talos.danny.cz kernel: Buffer I/O error on dev nvme1n1, logical block 117211554, async page read [ 317.145102] talos.danny.cz kernel: EEH: Frozen PHB#30-PE#1fa detected [ 317.145131] talos.danny.cz kernel: (udev-worker): attempt to access beyond end of device nvme1n1: rw=0, sector=1875384880, nr_sectors = 16 limit=0 [ 317.145171] talos.danny.cz kernel: EEH: Call Trace: [ 317.145191] talos.danny.cz kernel: Buffer I/O error on dev nvme1n1, logical block 117211555, async page read [ 317.145239] talos.danny.cz kernel: EEH: [0000000043837525] __eeh_send_failure_event+0x78/0x150 [ 317.145266] talos.danny.cz kernel: (udev-worker): attempt to access beyond end of device nvme1n1: rw=0, sector=1875384896, nr_sectors = 16 limit=0 [ 317.145297] talos.danny.cz kernel: EEH: [00000000ca9d8fff] eeh_dev_check_failure+0x2e0/0x6d0 [ 317.145318] talos.danny.cz kernel: Buffer I/O error on dev nvme1n1, logical block 117211556, async page read [ 317.145322] talos.danny.cz kernel: (udev-worker): attempt to access beyond end of device nvme1n1: rw=0, sector=1875384912, nr_sectors = 16 limit=0 [ 317.145362] talos.danny.cz kernel: EEH: [000000007534829c] pnv_pci_read_config+0x148/0x180 [ 317.145392] talos.danny.cz kernel: Buffer I/O error on dev nvme1n1, logical block 117211557, async page read [ 317.145415] talos.danny.cz kernel: EEH: [000000006db2aa5c] pci_bus_read_config_dword+0x90/0xf0 [ 317.145463] talos.danny.cz kernel: (udev-worker): attempt to access beyond end of device nvme1n1: rw=0, sector=1875384928, nr_sectors = 16 limit=0 [ 317.145482] talos.danny.cz kernel: EEH: [00000000ed12774a] pci_find_next_ext_capability+0x5c/0x150 [ 317.145514] talos.danny.cz kernel: Buffer I/O error on dev nvme1n1, logical block 117211558, async page read [ 317.145544] talos.danny.cz kernel: EEH: [000000003a1ec347] pci_restore_ltr_state+0x40/0xa0 [ 317.145591] talos.danny.cz kernel: (udev-worker): attempt to access beyond end of device nvme1n1: rw=0, sector=1875384944, nr_sectors = 16 limit=0 [ 317.145620] talos.danny.cz kernel: EEH: [000000000c3c04be] pci_restore_state.part.0+0x2c/0x3b0 [ 317.145652] talos.danny.cz kernel: Buffer I/O error on dev nvme1n1, logical block 117211559, async page read [ 317.145682] talos.danny.cz kernel: EEH: [000000005a69e2f4] nvme_slot_reset+0x48/0x90 [nvme] [ 317.148788] talos.danny.cz kernel: Buffer I/O error on dev nvme1n1p1, logical block 8191999, async page read [ 317.149262] talos.danny.cz kernel: EEH: [00000000efdedb77] eeh_report_reset+0xd0/0x100 [ 317.164254] talos.danny.cz kernel: EEH: [00000000bdb52d8d] eeh_pe_report+0x2bc/0x548 [ 317.164259] talos.danny.cz kernel: EEH: [0000000039694420] eeh_handle_normal_event+0x89c/0x9c0 [ 317.164262] talos.danny.cz kernel: EEH: [00000000bbf75c7c] eeh_event_handler+0xfc/0x170 [ 317.164264] talos.danny.cz kernel: EEH: [00000000786c4402] kthread+0x150/0x160 [ 317.164269] talos.danny.cz kernel: EEH: [00000000393b9885] start_kernel_thread+0x14/0x18 [ 317.164271] talos.danny.cz kernel: EEH: This PCI device has failed 2 times in the last hour and will be permanently disabled after 5 failures. [ 317.164273] talos.danny.cz kernel: EEH: Notify device drivers to shutdown [ 317.164276] talos.danny.cz kernel: EEH: Beginning: 'error_detected(IO frozen)' [ 317.164278] talos.danny.cz kernel: PCI 0030:0e:00.0#01fa: EEH: Invoking nvme->error_detected(IO frozen) [ 317.164281] talos.danny.cz kernel: nvme nvme1: frozen state error detected, reset controller [ 317.153365] talos.danny.cz dracut-initqueue[1582]: Scanning devices md127 sda2 for LVM logical volumes Linux/Root [ 317.172717] talos.danny.cz dracut-initqueue[1603]: WARNING: Couldn't find device with uuid MFWATY-ZceS-Qeub-HcO3-ZAv2-0Q2D-jDpFwj. [ 317.172717] talos.danny.cz dracut-initqueue[1603]: WARNING: VG Linux is missing PV MFWATY-ZceS-Qeub-HcO3-ZAv2-0Q2D-jDpFwj (last written to /dev/md0). [ 317.283754] talos.danny.cz kernel: PCI 0030:0e:00.0#01fa: EEH: nvme driver reports: 'disconnect' [ 317.283760] talos.danny.cz kernel: EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'disconnect' [ 317.285796] talos.danny.cz kernel: EEH: Unable to recover from failure from PHB#30-PE#1fa. Please try reseating or replacing it [ 317.287465] talos.danny.cz kernel: EEH: of node=0030:0e:00.0 [ 317.287983] talos.danny.cz kernel: EEH: PCI device/vendor: ffffffff [ 317.288503] talos.danny.cz kernel: EEH: PCI cmd/status register: ffffffff [ 317.289009] talos.danny.cz kernel: EEH: PCI-E capabilities and status follow: [ 317.289538] talos.danny.cz kernel: EEH: PCI-E 00: ffffffff ffffffff ffffffff ffffffff [ 317.290053] talos.danny.cz kernel: EEH: PCI-E 10: ffffffff ffffffff ffffffff ffffffff [ 317.290550] talos.danny.cz kernel: EEH: PCI-E 20: ffffffff [ 317.291068] talos.danny.cz kernel: EEH: PCI-E AER capability register set follows: [ 317.291574] talos.danny.cz kernel: EEH: PCI-E AER 00: ffffffff ffffffff ffffffff ffffffff [ 317.292072] talos.danny.cz kernel: EEH: PCI-E AER 10: ffffffff ffffffff ffffffff ffffffff [ 317.292554] talos.danny.cz kernel: EEH: PCI-E AER 20: ffffffff ffffffff ffffffff ffffffff [ 317.293038] talos.danny.cz kernel: EEH: PCI-E AER 30: ffffffff ffffffff [ 317.293501] talos.danny.cz kernel: PHB4 PHB#48 Diag-data (Version: 1) [ 317.293959] talos.danny.cz kernel: brdgCtl: 00000002 [ 317.294510] talos.danny.cz kernel: RootSts: 00000020 00402000 a0830008 00100107 00002000 [ 317.294512] talos.danny.cz kernel: PhbSts: 0000001c00000000 0000001c00000000 [ 317.294514] talos.danny.cz kernel: Lem: 0000000100280000 0000000000000000 0000000100000000 [ 317.294516] talos.danny.cz kernel: PhbErr: 0000088000000000 0000008000000000 2148000098000240 a008400000000000 [ 317.294518] talos.danny.cz kernel: RxeArbErr: 4000200000000000 0000200000000000 02409fde30000000 0000000000000000 [ 317.294519] talos.danny.cz kernel: PblErr: 0000000001000000 0000000001000000 0000000000000000 0000000000000000 [ 317.294521] talos.danny.cz kernel: RegbErr: 0000004000000000 0000004000000000 61000c4800000000 0000000000000000 [ 317.294525] talos.danny.cz kernel: PE[1fa] A/B: b740002a02380000 8000000000000000 [ 317.294526] talos.danny.cz kernel: PE[1fb] A/B: af40000c00000000 800000000e000010 [ 317.294529] talos.danny.cz kernel: EEH: Beginning: 'error_detected(permanent failure)' [ 317.297129] talos.danny.cz kernel: PCI 0030:0e:00.0#01fa: EEH: Invoking nvme->error_detected(permanent failure) [ 317.299703] talos.danny.cz kernel: nvme nvme1: failure state error detected, request disconnect [ 317.302102] talos.danny.cz kernel: PCI 0030:0e:00.0#01fa: EEH: nvme driver reports: 'disconnect' [ 317.340517] talos.danny.cz dracut-initqueue[1582]: Linux/Root linear [ 317.353416] talos.danny.cz dracut-initqueue[1607]: WARNING: Couldn't find device with uuid MFWATY-ZceS-Qeub-HcO3-ZAv2-0Q2D-jDpFwj. [ 317.353877] talos.danny.cz dracut-initqueue[1607]: WARNING: VG Linux is missing PV MFWATY-ZceS-Qeub-HcO3-ZAv2-0Q2D-jDpFwj (last written to /dev/md0). [ 317.353877] talos.danny.cz dracut-initqueue[1607]: Refusing activation of partial LV Linux/Root. Use '--activationmode partial' to override. [ 317.302106] talos.danny.cz kernel: EEH: Finished:'error_detected(permanent failure)' [ 317.664194] talos.danny.cz kernel: pci 0030:0e : [PE# 1fa] Releasing PE [ 317.665913] talos.danny.cz kernel: pci 0030:0e : [PE# 1fa] Removing DMA window #0 [ 317.666648] talos.danny.cz kernel: pci 0030:0e : [PE# 1fa] Disabling 64-bit DMA bypass