Hi, I am seeing a boot failure after applying this series on top of the pci tree [1]. Note that this error was seen on a system where I have a dedicated NVME. Systems without dedicated disk boot fine
[ 2.119058] nvme nvme3: D3 entry latency set to 8 seconds [ 2.132609] xive: H_INT_GET_SOURCE_INFO lisn=0x1 failed -55 [ 4.486307] nvme nvme0: D3 entry latency set to 10 seconds [ 28.193633] watchdog: BUG: soft lockup - CPU#280 stuck for 26s! [kworker/280:0:1436] [ 28.193637] CPU#280 Utilization every 4s during lockup: [ 28.193640] #1: 101% system, 0% softirq, 0% hardirq, 0% idle [ 28.193648] #2: 100% system, 0% softirq, 0% hardirq, 0% idle [ 28.193650] #3: 100% system, 0% softirq, 0% hardirq, 0% idle [ 28.193653] #4: 101% system, 0% softirq, 0% hardirq, 0% idle [ 28.193654] #5: 100% system, 0% softirq, 0% hardirq, 0% idle [ 28.193657] Modules linked in: nvme nvme_core nvme_keyring nvme_auth pseries_wdt scsi_dh_rdac scsi_dh_emc scsi_dh_alua aes_gcm_p10_crypto crypto_simd cryptd [ 28.193672] CPU: 280 UID: 0 PID: 1436 Comm: kworker/280:0 Not tainted 6.16.0-rc1+ #5 VOLUNTARY [ 28.193675] Hardware name: IBM,9080-HEX Power11 (architected) 0x820200 0xf000007 of:IBM,FW1110.00 (NH1110_015) hv:phyp pSeries [ 28.193677] Workqueue: events work_for_cpu_fn [ 28.193684] NIP: c0000000017d8a84 LR: c00000000032d168 CTR: c0000000001a3c20 [ 28.193686] REGS: c000001c03b8b0d0 TRAP: 0900 Not tainted (6.16.0-rc1+) [ 28.193687] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24442220 XER: 00000010 [ 28.193694] CFAR: 0000000000000000 IRQMASK: 0 [ 28.193694] GPR00: 18000003852fe563 c000001c03b8b370 c000000001fb8100 c000001c297f1b0c [ 28.193694] GPR04: 0000000000000003 0000000000000009 0000000038c78f48 0000000000000004 [ 28.193694] GPR08: ffffffffffffffff 0000000000000020 c000001c297f2b50 0000000000000003 [ 28.193694] GPR12: c000001c297f2b00 c00000258fffeb00 c000000000296018 c000000003d7c040 [ 28.193694] GPR16: 0000000000000000 0000000000000000 0000000000000000 c000001c03b8b8b0 [ 28.193694] GPR20: c000001c02365800 c000000027da9000 c00000000298b9e8 0000000000000002 [ 28.193694] GPR24: 0000000000000001 0000000000000001 000000000000003f 0000000000000001 [ 28.193694] GPR28: 000000010000003e c000000001a4e298 0000000038c78f48 3fffffe3d680d500 [ 28.193712] NIP [c0000000017d8a84] mtree_load+0x244/0x370 [ 28.193717] LR [c00000000032d168] irq_to_desc+0x28/0x40 [ 28.193721] Call Trace: [ 28.193722] [c000001c03b8b370] [0000000000000001] 0x1 (unreliable) [ 28.193727] [c000001c03b8b420] [c00000000032d168] irq_to_desc+0x28/0x40 [ 28.193729] [c000001c03b8b440] [c0000000003367dc] irq_get_irq_data+0x1c/0x40 [ 28.193733] [c000001c03b8b460] [c00000000033bbbc] irq_domain_free_irqs_hierarchy+0x5c/0xe0 [ 28.193736] [c000001c03b8b4a0] [c0000000001eaa88] pseries_irq_domain_alloc+0x1d8/0x2e0 [ 28.193740] [c000001c03b8b5c0] [c00000000033b790] irq_domain_alloc_irqs_parent+0x40/0xa0 [ 28.193742] [c000001c03b8b620] [c0000000003418bc] msi_domain_alloc+0xcc/0x230 [ 28.193744] [c000001c03b8b6a0] [c00000000033b824] irq_domain_alloc_irqs_hierarchy+0x34/0x90 [ 28.193747] [c000001c03b8b700] [c00000000033d16c] irq_domain_alloc_irqs_locked+0x16c/0x5a0 [ 28.193749] [c000001c03b8b7e0] [c00000000033dbc0] __irq_domain_alloc_irqs+0x70/0xd0 [ 28.193751] [c000001c03b8b880] [c000000000342988] __msi_domain_alloc_irqs+0x208/0x510 [ 28.193754] [c000001c03b8b940] [c0000000003445ec] msi_domain_alloc_irqs_all_locked+0x6c/0x100 [ 28.193757] [c000001c03b8b9a0] [c000000000eca400] pci_msi_setup_msi_irqs+0x60/0x80 [ 28.193761] [c000001c03b8b9c0] [c000000000ec8bcc] msix_setup_interrupts+0x18c/0x2f0 [ 28.193764] [c000001c03b8baa0] [c000000000ec92ac] __pci_enable_msix_range+0x57c/0x840 [ 28.193767] [c000001c03b8bb70] [c000000000ec6948] pci_alloc_irq_vectors_affinity+0xf8/0x1d0 [ 28.193769] [c000001c03b8bc00] [c0080000131641cc] nvme_setup_io_queues+0x2c4/0x570 [nvme] [ 28.193776] [c000001c03b8bd00] [c008000013167e98] nvme_probe+0x340/0x450 [nvme] [ 28.193780] [c000001c03b8bda0] [c000000000eb6bb4] local_pci_probe+0x64/0xf0 [ 28.193784] [c000001c03b8be20] [c000000000280804] work_for_cpu_fn+0x34/0x50 [ 28.193786] [c000001c03b8be50] [c0000000002865c0] process_one_work+0x1f0/0x500 [ 28.193788] [c000001c03b8bf00] [c00000000028800c] worker_thread+0x33c/0x510 [ 28.193791] [c000001c03b8bf90] [c000000000296160] kthread+0x150/0x160 [ 28.193793] [c000001c03b8bfe0] [c00000000000ded8] start_kernel_thread+0x14/0x18 [ 28.193795] Code: 2c090001 4182fe5c 4bfffeb8 280b0002 79291d68 4081ffac 280b0003 39400000 4082ffb0 394c0050 7c6a482a 7c2004ac <e92c0000> 792905e4 7c2c4840 4082ffac I eventually see a kernel OOPS [ 119.622966] xive: H_INT_GET_SOURCE_INFO lisn=0x1 failed -55 [ 119.623008] Kernel attempted to read user page (0) - exploit attempt? (uid: 0) [ 119.623028] BUG: Kernel NULL pointer dereference on read at 0x00000000 [ 119.623048] Faulting instruction address: 0xc0000000001a0e48 [ 119.623056] Oops: Kernel access of bad area, sig: 11 [#1] [ 119.623062] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries [ 119.623074] Modules linked in: nvme nvme_core nvme_keyring nvme_auth pseries_wdt scsi_dh_rdac scsi_dh_emc scsi_dh_alua aes_gcm_p10_crypto crypto_simd cryptd [ 119.623096] CPU: 48 UID: 0 PID: 1 Comm: systemd Tainted: G L 6.16.0-rc1+ #5 VOLUNTARY [ 119.623104] Tainted: [L]=SOFTLOCKUP [ 119.623108] Hardware name: IBM,9080-HEX Power11 (architected) 0x820200 0xf000007 of:IBM,FW1110.00 (NH1110_015) hv:phyp pSeries [ 119.623115] NIP: c0000000001a0e48 LR: c0000000003334b8 CTR: c0000000001a0de0 [ 119.623122] REGS: c000000008087080 TRAP: 0300 Tainted: G L (6.16.0-rc1+) [ 119.623129] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24484408 XER: 00000155 [ 119.623157] CFAR: c0000000001a0e78 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 1 [ 119.623157] GPR00: c000000000333a98 c000000008087320 c000000001fb8100 c0000008ed903118 [ 119.623157] GPR04: 0000000000000001 0000000000000061 0000000000000000 0000000000000000 [ 119.623157] GPR08: c0000008ed903000 0000000002030001 0000000000000000 0000000000000000 [ 119.623157] GPR12: c0000000001a0de0 c0000008cffc4700 0000000000000000 0000000000000000 [ 119.623157] GPR16: c000000001997440 c000000061001400 c0000008ed432e80 0000000000500001 [ 119.623157] GPR20: 000fffffffe00000 0000000000500001 ffffffffffffffff c000000001dd7478 [ 119.623157] GPR24: 0000000000000000 c00000128de55000 c0000008ed903194 c0000008ed903248 [ 119.623157] GPR28: 000000000000003a c000000000334168 c000000063929800 0000000000000001 [ 119.623223] NIP [c0000000001a0e48] xive_irq_set_type+0x68/0x130 [ 119.623230] LR [c0000000003334b8] __irq_set_trigger+0x88/0x270 [ 119.623238] Call Trace: [ 119.623242] [c000000008087320] [c000000008087390] 0xc000000008087390 (unreliable) [ 119.623250] [c0000000080873b0] [c000000000333a98] __setup_irq+0x3f8/0x980 [ 119.623257] [c000000008087450] [c000000000334168] request_threaded_irq+0x148/0x270 [ 119.623265] [c0000000080874c0] [c000000000fb6ff4] notifier_add_irq+0x64/0x90 [ 119.623274] [c0000000080874f0] [c000000000fb56a4] hvc_open+0x94/0x1b0 [ 119.623281] [c000000008087570] [c000000000f81304] tty_open+0x1f4/0x7e0 [ 119.623289] [c000000008087620] [c0000000007c6778] chrdev_open+0x158/0x390 [ 119.623297] [c000000008087690] [c0000000007b59c4] do_dentry_open+0x294/0x790 [ 119.623305] [c0000000080876f0] [c0000000007b816c] vfs_open+0x3c/0x140 [ 119.623313] [c000000008087730] [c0000000007d75cc] do_open+0x35c/0x540 [ 119.623322] [c000000008087790] [c0000000007dd080] path_openat+0x140/0x310 [ 119.623328] [c000000008087800] [c0000000007dd324] do_filp_open+0xd4/0x1c0 [ 119.623334] [c000000008087940] [c0000000007b87bc] do_sys_openat2+0xbc/0x160 [ 119.623340] [c0000000080879c0] [c0000000007b8c1c] sys_openat+0x7c/0xd0 [ 119.623346] [c000000008087a20] [c000000000032610] system_call_exception+0x160/0x310 [ 119.623353] [c000000008087e50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec [ 119.623360] ---- interrupt: 3000 at 0x7fff7fd18694 [ 119.623364] NIP: 00007fff7fd18694 LR: 00007fff7fd18694 CTR: 0000000000000000 [ 119.623368] REGS: c000000008087e80 TRAP: 3000 Tainted: G L (6.16.0-rc1+) [ 119.623373] MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 48488408 XER: 00000000 [ 119.623384] IRQMASK: 0 [ 119.623384] GPR00: 000000000000011e 00007fffc3826190 0000000000100000 ffffffffffffff9c [ 119.623384] GPR04: 00007fff8070cd78 0000000000080101 0000000000000000 0000000000000000 [ 119.623384] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 119.623384] GPR12: 0000000000000000 00007fff808d2b80 0000000000000000 00007fffc3826708 [ 119.623384] GPR16: 00007fffc38266d0 00007fff80715230 0000000000000001 00000001183fc840 [ 119.623384] GPR20: 00007fffc38266b0 00007fff807244a8 0000000000000006 00007fff807244a8 [ 119.623384] GPR24: 0000000000000000 0000000000000000 00007fffc3826570 00007fff8070cd78 [ 119.623384] GPR28: 0000000000080101 00007fffc3826278 0000000000000015 fffffffffffffff7 [ 119.623430] NIP [00007fff7fd18694] 0x7fff7fd18694 [ 119.623433] LR [00007fff7fd18694] 0x7fff7fd18694 [ 119.623436] ---- interrupt: 3000 [ 119.623439] Code: 55290036 91280000 e9030010 60420000 81280000 7d292378 91280000 e9030010 60420000 81280000 65290200 91280000 <e9270000> 5528fffe 552907bc 7c0a4000 [ 119.623455] ---[ end trace 0000000000000000 ]--- [ 119.625060] nvme nvme1: D3 entry latency set to 10 seconds [ 119.627351] pstore: backend (nvram) writing error (-1) [ 119.627358] [ 120.627361] note: systemd[1] exited with irqs disabled [ 120.627425] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 121.709848] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]--- [1] : https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git/log/?h=controller/msi-parent Thanks, Gautam