------- Comment From dnban...@us.ibm.com 2018-04-26 10:58 EDT------- I took a quick look at the crash stacks mentioned in c191-c193. Since we don't have a debug kernel for "4.15.0-15-generic #16+bug166588" I just looked at the stacks. From that it seems reasonable to draw the conclusion that these appear to be all manifestations of issues we have seen before. I tried to categorize them below. Note that some of these were hit before booting into the actual kernel so it would be a good idea to install a skiroot kernel with the above patches as well (as was indeed decided in the meeting and Klaus mentions in #194).
crash 201804260138 ============== [ 27.682301] NIP [c000000000389760] kmem_cache_alloc+0x2e0/0x340 [ 27.682343] LR [c00000000038974c] kmem_cache_alloc+0x2cc/0x340 [ 27.682386] Call Trace: [ 27.682406] [c000000005fef5c0] [c000000005fef610] 0xc000000005fef610 (unreliable) [ 27.682459] [c000000005fef620] [c0000000002dfacc] mempool_alloc_slab+0x2c/0x40 [ 27.682510] [c000000005fef640] [c0000000002dff18] mempool_alloc+0x88/0x1e0 [ 27.682555] [c000000005fef6d0] [c0000000006724fc] bio_alloc_bioset+0x1ac/0x2e0 [ 27.682607] [c000000005fef740] [c00000000042a904] submit_bh_wbc+0xd4/0x240 [ 27.682650] [c000000005fef790] [c00000000042b9a0] ll_rw_block+0x130/0x1a0 [ 27.682694] [c000000005fef7f0] [c00000000042bae4] __breadahead+0x44/0xb0 [ 27.682739] [c000000005fef820] [c0000000004cb9a8] __ext4_get_inode_loc+0x448/0x5c0 [ 27.682789] [c000000005fef8e0] [c0000000004cffbc] ext4_iget+0x9c/0xc40 [ 27.682832] [c000000005fef9d0] [c0000000004ef234] ext4_lookup+0x1b4/0x2e0 GPR24: e6eef6af4c054c5f c000200e585a3901 26eed6a1145f755e c0000000002dfacc ^^^^^^^^^^^^ GPR28: c000000ff901ee00 0000000001011200 c000200e585a3901 c000000ff901ee00 ^^^^^^^^^^^^ ^^^^^^^^^^^^ appears to be kmem cache corruption. seems like another instantiation of the double free issue (likely). crash 201804252219 ============== [ 84.702368] NIP [c000000000389ed0] kmem_cache_alloc_node+0x2f0/0x350 [ 84.702407] LR [c000000000389ebc] kmem_cache_alloc_node+0x2dc/0x350 [ 84.702446] Call Trace: [ 84.702463] [c000000005e77940] [c000000000389d94] kmem_cache_alloc_node+0x1b4/0x350 (unreliable) [ 84.702520] [c000000005e779b0] [c000000000b2eb6c] __alloc_skb+0x6c/0x220 [ 84.702560] [c000000005e77a10] [c000000000b30a6c] alloc_skb_with_frags+0x7c/0x2e0 [ 84.702608] [c000000005e77aa0] [c000000000b246cc] sock_alloc_send_pskb+0x29c/0x2c0 [ 84.702655] [c000000005e77b50] [c000000000c569e4] unix_stream_sendmsg+0x264/0x5c0 [ 84.702703] [c000000005e77c30] [c000000000b1eb64] sock_sendmsg+0x64/0x90 [ 84.702743] [c000000005e77c60] [c000000000b1ec48] sock_write_iter+0xb8/0x120 [ 84.702791] [c000000005e77d00] [c0000000003cf494] new_sync_write+0x104/0x160 [ 84.702838] [c000000005e77d90] [c0000000003d2bd8] vfs_write+0xd8/0x220 [ 84.702878] [c000000005e77de0] [c0000000003d2ef8] SyS_write+0x68/0x110 [ 84.702919] [c000000005e77e30] [c00000000000b184] system_call+0x58/0x6c GPR24: c000200e585ebc01 26eed6a1145bf0fd c000000000b2eb6c c000000ff901ee00 ^^^^^^^^^^^^ GPR28: ffffffffffffffff 00000000015004c0 c000200e585ebc01 c000000ff901ee00 ^^^^^^^^^^^^ ^^^^^^^^^^^^ appears to be kmem cache corruption. another case of double free (?) crash 201804251933 ============= [ 7083.142916] NIP [c00000000013277c] process_one_work+0x3c/0x5a0 [ 7083.142965] LR [c000000000132d78] worker_thread+0x98/0x630 [ 7083.143004] Call Trace: [ 7083.143026] [c000200bb70b7c90] [c0000000001329f4] process_one_work+0x2b4/0x5a0 (unreliable) [ 7083.143085] [c000200bb70b7d20] [c000000000132d78] worker_thread+0x98/0x630 [ 7083.143134] [c000200bb70b7dc0] [c00000000013b9a8] kthread+0x1a8/0x1b0 [ 7083.143185] [c000200bb70b7e30] [c00000000000b528] ret_from_kernel_thread+0x5c/0xb4 GPR08: c000200e60eb7df0 0000000000000000 0000000000002040 c000200e60ea10a8 ^^^^^^^^^^^^ the worker object issue again. crash 201804251726 ============== [ 48.707329] NIP [c000000000389ed0] kmem_cache_alloc_node+0x2f0/0x350 [ 48.707376] LR [c000000000389ebc] kmem_cache_alloc_node+0x2dc/0x350 [ 48.707422] Call Trace: [ 48.707444] [c000200e46c07890] [c000000000389d94] kmem_cache_alloc_node+0x1b4/0x350 (unreliable) [ 48.707511] [c000200e46c07900] [c000000000b2eb6c] __alloc_skb+0x6c/0x220 [ 48.707561] [c000200e46c07960] [c000000000cf4004] kobject_uevent_env+0x804/0xa40 [ 48.707620] [c000200e46c07a40] [c000000000aa3338] dm_kobject_uevent+0x78/0xd0 [ 48.707676] [c000200e46c07ae0] [c000000000aab930] dev_suspend+0x360/0x390 [ 48.707725] [c000200e46c07b30] [c000000000aac110] ctl_ioctl+0x200/0x5a0 [ 48.707773] [c000200e46c07d20] [c000000000aac4d0] dm_ctl_ioctl+0x20/0x30 [ 48.707822] [c000200e46c07d40] [c0000000003ef9f4] do_vfs_ioctl+0xd4/0xa00 [ 48.707870] [c000200e46c07de0] [c0000000003f03e4] SyS_ioctl+0xc4/0x130 [ 48.707920] [c000200e46c07e30] [c00000000000b184] system_call+0x58/0x6c GPR24: c000200e585e3a01 26eed6a1145b76a7 c000000000b2eb6c c000000ff901ee00 ^^^^^^^^^^^^ GPR28: ffffffffffffffff 00000000014000c0 c000200e585e3a01 c000000ff901ee00 ^^^^^^^^^^^^ ^^^^^^^^^^^^ appears to be a case of kmem cache corruption again. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1762844 Title: ISST-LTE:KVM:Ubuntu1804:BostonLC:boslcp3: Host crashed & enters into xmon after moving to 4.15.0-15.16 kernel To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1762844/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs