------- Comment From gbert...@br.ibm.com 2016-09-27 23:02 EDT------- Hi Canonical,
Our test teams had problems with booting the Ubuntu-4.4.0-40.60 kernel. They hit an Oops in the NVMe probe path. My current understanding is that this is already fixed in your kernel by the fixup commit: e9820e415895 (" UBUNTU: (fix) NVMe: Don't unmap controller registers on reset") which will be published in kernel Ubuntu-4.4.0-41.61. We will try again with that version. While we are here, I noticed another issue with the backport of 30d6592fce71 (" NVMe: Don't unmap controller registers on reset"). Looks like you are missing the fixup commit that went later into the 4.4.y tree: 81e9a969c441 ('nvme: Call pci_disable_device on the error path") #4.4.y tree. I opened Bug 146899 to track this. It will be mirrored soon. ** Tags removed: verification-needed-xenial ** Tags added: verification-failed-xenial -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1620317 Title: ISST-LTE:pNV: system ben is hung during ST (nvme) Status in linux package in Ubuntu: Fix Released Status in linux source package in Xenial: Fix Committed Status in linux source package in Yakkety: Fix Released Bug description: On when we are running I/O intensive tasks and CPU addition/removal, the block may hang stalling the entire machine. The backtrace below is one of the symptoms: [12747.111149] ---[ end trace b4d8d720952460b5 ]--- [12747.126885] Trying to free IRQ 357 from IRQ context! [12747.146930] ------------[ cut here ]------------ [12747.166674] WARNING: at /build/linux-iLHNl3/linux-4.4.0/kernel/irq/manage.c:1438 [12747.184069] Modules linked in: minix nls_iso8859_1 rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) configfs ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_ib(OE) mlx4_ib(OE) ib_sa(OE) ib_mad(OE) ib_core(OE) ib_addr(OE) mlx4_en(OE) mlx4_core(OE) binfmt_misc xfs joydev input_leds mac_hid ofpart cmdlinepart powernv_flash ipmi_powernv mtd ipmi_msghandler at24 opal_prd powernv_rng ibmpowernv uio_pdrv_genirq uio sunrpc knem(OE) autofs4 btrfs xor raid6_pq hid_generic usbhid hid uas usb_storage nouveau ast bnx2x i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops mlx5_core(OE) ahci drm mdio libcrc32c mlx_compat(OE) libahci vxlan nvme ip6_udp_tunnel udp_tunnel [12747.349013] CPU: 80 PID: 0 Comm: swapper/80 Tainted: G W OEL 4.4.0-21-generic #37-Ubuntu [12747.369046] task: c000000f1fab89b0 ti: c000000f1fb6c000 task.ti: c000000f1fb6c000 [12747.404848] NIP: c000000000131888 LR: c000000000131884 CTR: 00000000300303f0 [12747.808333] REGS: c000000f1fb6e550 TRAP: 0700 Tainted: G W OEL (4.4.0-21-generic) [12747.867658] MSR: 9000000100029033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28022222 XER: 20000000 [12747.884783] CFAR: c000000000aea8f4 SOFTE: 1 GPR00: c000000000131884 c000000f1fb6e7d0 c0000000015b4200 0000000000000028 GPR04: c000000f2a409c50 c000000f2a41b4e0 0000000f29480000 00000000000033da GPR08: 0000000000000007 c000000000f8b27c 0000000f29480000 9000000100001003 GPR12: 0000000000002200 c000000007b6f800 c000000f2a40a938 0000000000000100 GPR16: c000000f11480000 0000000000003a98 0000000000000000 0000000000000000 GPR20: 0000000000000000 d000000009521008 d0000000095146a0 fffffffffffff000 GPR24: c000000004a19ef0 0000000000000000 0000000000000003 000000000000007d GPR28: 0000000000000165 c000000eefeb1800 c000000eef830600 0000000000000165 [12748.243270] NIP [c000000000131888] __free_irq+0x238/0x370 [12748.254089] LR [c000000000131884] __free_irq+0x234/0x370 [12748.269738] Call Trace: [12748.286740] [c000000f1fb6e7d0] [c000000000131884] __free_irq+0x234/0x370 (unreliable) [12748.289687] [c000000f1fb6e860] [c000000000131af8] free_irq+0x88/0xb0 [12748.304594] [c000000f1fb6e890] [d000000009514528] nvme_suspend_queue+0xc8/0x150 [nvme] [12748.333825] [c000000f1fb6e8c0] [d00000000951681c] nvme_dev_disable+0x3fc/0x400 [nvme] [12748.340913] [c000000f1fb6e9a0] [d000000009516ae4] nvme_timeout+0xe4/0x260 [nvme] [12748.357136] [c000000f1fb6ea60] [c000000000548a34] blk_mq_rq_timed_out+0x64/0x110 [12748.383939] [c000000f1fb6ead0] [c00000000054c540] bt_for_each+0x160/0x170 [12748.399292] [c000000f1fb6eb40] [c00000000054d4e8] blk_mq_queue_tag_busy_iter+0x78/0x110 [12748.402665] [c000000f1fb6eb90] [c000000000547358] blk_mq_rq_timer+0x48/0x140 [12748.438649] [c000000f1fb6ebd0] [c00000000014a13c] call_timer_fn+0x5c/0x1c0 [12748.468126] [c000000f1fb6ec60] [c00000000014a5fc] run_timer_softirq+0x31c/0x3f0 [12748.483367] [c000000f1fb6ed30] [c0000000000beb78] __do_softirq+0x188/0x3e0 [12748.498378] [c000000f1fb6ee20] [c0000000000bf048] irq_exit+0xc8/0x100 [12748.501048] [c000000f1fb6ee40] [c00000000001f954] timer_interrupt+0xa4/0xe0 [12748.516377] [c000000f1fb6ee70] [c000000000002714] decrementer_common+0x114/0x180 [12748.547282] --- interrupt: 901 at arch_local_irq_restore+0x74/0x90 [12748.547282] LR = arch_local_irq_restore+0x74/0x90 [12748.574141] [c000000f1fb6f160] [0000000000000001] 0x1 (unreliable) [12748.592405] [c000000f1fb6f180] [c000000000aedc3c] dump_stack+0xd0/0xf0 [12748.596461] [c000000f1fb6f1c0] [c0000000001006fc] dequeue_task_idle+0x5c/0x90 [12748.611532] [c000000f1fb6f230] [c0000000000f6080] deactivate_task+0xc0/0x130 [12748.627685] [c000000f1fb6f270] [c000000000adcb10] __schedule+0x440/0x990 [12748.654416] [c000000f1fb6f300] [c000000000add0a8] schedule+0x48/0xc0 [12748.670558] [c000000f1fb6f330] [c000000000ae1474] schedule_timeout+0x274/0x350 [12748.673485] [c000000f1fb6f420] [c000000000ade23c] wait_for_common+0xec/0x240 [12748.699192] [c000000f1fb6f4a0] [c0000000000e6908] kthread_stop+0x88/0x210 [12748.718385] [c000000f1fb6f4e0] [d000000009514240] nvme_dev_list_remove+0x90/0x110 [nvme] [12748.748925] [c000000f1fb6f510] [d000000009516498] nvme_dev_disable+0x78/0x400 [nvme] [12748.752112] [c000000f1fb6f5f0] [d000000009516ae4] nvme_timeout+0xe4/0x260 [nvme] [12748.775395] [c000000f1fb6f6b0] [c000000000548a34] blk_mq_rq_timed_out+0x64/0x110 [12748.821069] [c000000f1fb6f720] [c00000000054c540] bt_for_each+0x160/0x170 [12748.851733] [c000000f1fb6f790] [c00000000054d4e8] blk_mq_queue_tag_busy_iter+0x78/0x110 [12748.883093] [c000000f1fb6f7e0] [c000000000547358] blk_mq_rq_timer+0x48/0x140 [12748.918348] [c000000f1fb6f820] [c00000000014a13c] call_timer_fn+0x5c/0x1c0 [12748.934743] [c000000f1fb6f8b0] [c00000000014a5fc] run_timer_softirq+0x31c/0x3f0 [12748.938084] [c000000f1fb6f980] [c0000000000beb78] __do_softirq+0x188/0x3e0 [12748.960815] [c000000f1fb6fa70] [c0000000000bf048] irq_exit+0xc8/0x100 [12748.992175] [c000000f1fb6fa90] [c00000000001f954] timer_interrupt+0xa4/0xe0 [12749.019299] [c000000f1fb6fac0] [c000000000002714] decrementer_common+0x114/0x180 [12749.037168] --- interrupt: 901 at arch_local_irq_restore+0x74/0x90 [12749.037168] LR = arch_local_irq_restore+0x74/0x90 [12749.079044] [c000000f1fb6fdb0] [c000000f2a41d680] 0xc000000f2a41d680 (unreliable) [12749.081736] [c000000f1fb6fdd0] [c000000000909a28] cpuidle_enter_state+0x1a8/0x410 [12749.127094] [c000000f1fb6fe30] [c000000000119a88] call_cpuidle+0x78/0xd0 [12749.144435] [c000000f1fb6fe70] [c000000000119e5c] cpu_startup_entry+0x37c/0x480 [12749.166156] [c000000f1fb6ff30] [c00000000004563c] start_secondary+0x33c/0x360 [12749.186929] [c000000f1fb6ff90] [c000000000008b6c] start_secondary_prolog+0x10/0x14 [12749.223828] Instruction dump: [12749.223856] 4e800020 4bf83a5d 60000000 4bffff64 4bf83a51 60000000 4bffffa8 3c62ff7b [12749.233245] 7f84e378 38630fe0 489b900d 60000000 <0fe00000> 4bfffe20 7d2903a6 387d0118 [12749.298371] ---[ end trace b4d8d720952460b6 ]--- == Comment: #184 - Gabriel Krisman Bertazi <gbert...@br.ibm.com> - 2016-07-29 12:55:48 == I got it figured out. The nvme driver is not playing nice with the block timeout infrastructure, in a way that the timeout code goes into a live lock, waiting for the queue to be released. CPU hotplug, on the other hand, who is holding the queue freeze lock at the time, is waiting for an outstanding request to timeout (or complete). This request, in turn is stuck in the device, requiring a reset triggered by a timeout, which never happens due to the live lock. I don't have the reason why the request is stuck inside the device requiring a timeout, but this could even be caused by the Leaf firmware itself. I also see some successful timeouts triggered under normal conditions. In the failure event, we should be able to abort the request normally, but this happens via the timeout infrastructure, which is blocked during cpu hotplug events. I have a quirk to fully recover after the failure, by forcing a reset of the stucked IO, which allows the cpu hotplug completion and block layer recovery. I have a machine hitting the failure every few minutes in a loop, and recovering from it with my patch. Patch submitted to linux-block https://marc.info/?l=linux-block&m=146976739016592&w=2 == Comment: #207 - Gabriel Krisman Bertazi <gbert...@br.ibm.com> - 2016-09-05 09:13:51 == Canonical, This is fixed by: e57690fe009b ("blk-mq: don't overwrite rq->mq_ctx") 0e87e58bf60e ("blk-mq: improve warning for running a queue on the wrong CPU") 71f79fb3179e (" blk-mq: Allow timeouts to run while queue is freezing") Which will apply cleanly on top of your kernel. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1620317/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp