------- Comment From dnban...@us.ibm.com 2018-04-26 10:58 EDT-------
I took a quick look at the crash stacks mentioned in c191-c193. Since we don't
have a debug kernel for "4.15.0-15-generic #16+bug166588" I just looked
at the stacks. From that it seems reasonable to draw the conclusion that these
appear to be all manifestations of issues we have seen before. I tried to 
categorize
them below. Note that some of these were hit before booting into the actual 
kernel
so it would be a good idea to install a skiroot kernel with the above patches
as well (as was indeed decided in the meeting and Klaus mentions in #194).

crash 201804260138
==============
[   27.682301] NIP [c000000000389760] kmem_cache_alloc+0x2e0/0x340
[   27.682343] LR [c00000000038974c] kmem_cache_alloc+0x2cc/0x340
[   27.682386] Call Trace:
[   27.682406] [c000000005fef5c0] [c000000005fef610] 0xc000000005fef610 
(unreliable)
[   27.682459] [c000000005fef620] [c0000000002dfacc] 
mempool_alloc_slab+0x2c/0x40
[   27.682510] [c000000005fef640] [c0000000002dff18] mempool_alloc+0x88/0x1e0
[   27.682555] [c000000005fef6d0] [c0000000006724fc] 
bio_alloc_bioset+0x1ac/0x2e0
[   27.682607] [c000000005fef740] [c00000000042a904] submit_bh_wbc+0xd4/0x240
[   27.682650] [c000000005fef790] [c00000000042b9a0] ll_rw_block+0x130/0x1a0
[   27.682694] [c000000005fef7f0] [c00000000042bae4] __breadahead+0x44/0xb0
[   27.682739] [c000000005fef820] [c0000000004cb9a8] 
__ext4_get_inode_loc+0x448/0x5c0
[   27.682789] [c000000005fef8e0] [c0000000004cffbc] ext4_iget+0x9c/0xc40
[   27.682832] [c000000005fef9d0] [c0000000004ef234] ext4_lookup+0x1b4/0x2e0
GPR24: e6eef6af4c054c5f c000200e585a3901 26eed6a1145f755e c0000000002dfacc
^^^^^^^^^^^^
GPR28: c000000ff901ee00 0000000001011200 c000200e585a3901 c000000ff901ee00
^^^^^^^^^^^^   ^^^^^^^^^^^^

appears to be kmem cache corruption.
seems like another instantiation of the double free issue (likely).

crash 201804252219
==============
[   84.702368] NIP [c000000000389ed0] kmem_cache_alloc_node+0x2f0/0x350
[   84.702407] LR [c000000000389ebc] kmem_cache_alloc_node+0x2dc/0x350
[   84.702446] Call Trace:
[   84.702463] [c000000005e77940] [c000000000389d94] 
kmem_cache_alloc_node+0x1b4/0x350 (unreliable)
[   84.702520] [c000000005e779b0] [c000000000b2eb6c] __alloc_skb+0x6c/0x220
[   84.702560] [c000000005e77a10] [c000000000b30a6c] 
alloc_skb_with_frags+0x7c/0x2e0
[   84.702608] [c000000005e77aa0] [c000000000b246cc] 
sock_alloc_send_pskb+0x29c/0x2c0
[   84.702655] [c000000005e77b50] [c000000000c569e4] 
unix_stream_sendmsg+0x264/0x5c0
[   84.702703] [c000000005e77c30] [c000000000b1eb64] sock_sendmsg+0x64/0x90
[   84.702743] [c000000005e77c60] [c000000000b1ec48] sock_write_iter+0xb8/0x120
[   84.702791] [c000000005e77d00] [c0000000003cf494] new_sync_write+0x104/0x160
[   84.702838] [c000000005e77d90] [c0000000003d2bd8] vfs_write+0xd8/0x220
[   84.702878] [c000000005e77de0] [c0000000003d2ef8] SyS_write+0x68/0x110
[   84.702919] [c000000005e77e30] [c00000000000b184] system_call+0x58/0x6c

GPR24: c000200e585ebc01 26eed6a1145bf0fd c000000000b2eb6c c000000ff901ee00
^^^^^^^^^^^^
GPR28: ffffffffffffffff 00000000015004c0 c000200e585ebc01 c000000ff901ee00
^^^^^^^^^^^^  ^^^^^^^^^^^^

appears to be kmem cache corruption.
another case of double free (?)

crash 201804251933
=============
[ 7083.142916] NIP [c00000000013277c] process_one_work+0x3c/0x5a0
[ 7083.142965] LR [c000000000132d78] worker_thread+0x98/0x630
[ 7083.143004] Call Trace:
[ 7083.143026] [c000200bb70b7c90] [c0000000001329f4] 
process_one_work+0x2b4/0x5a0 (unreliable)
[ 7083.143085] [c000200bb70b7d20] [c000000000132d78] worker_thread+0x98/0x630
[ 7083.143134] [c000200bb70b7dc0] [c00000000013b9a8] kthread+0x1a8/0x1b0
[ 7083.143185] [c000200bb70b7e30] [c00000000000b528] 
ret_from_kernel_thread+0x5c/0xb4
GPR08: c000200e60eb7df0 0000000000000000 0000000000002040 c000200e60ea10a8
^^^^^^^^^^^^

the worker object issue again.

crash 201804251726
==============
[   48.707329] NIP [c000000000389ed0] kmem_cache_alloc_node+0x2f0/0x350
[   48.707376] LR [c000000000389ebc] kmem_cache_alloc_node+0x2dc/0x350
[   48.707422] Call Trace:
[   48.707444] [c000200e46c07890] [c000000000389d94] 
kmem_cache_alloc_node+0x1b4/0x350 (unreliable)
[   48.707511] [c000200e46c07900] [c000000000b2eb6c] __alloc_skb+0x6c/0x220
[   48.707561] [c000200e46c07960] [c000000000cf4004] 
kobject_uevent_env+0x804/0xa40
[   48.707620] [c000200e46c07a40] [c000000000aa3338] dm_kobject_uevent+0x78/0xd0
[   48.707676] [c000200e46c07ae0] [c000000000aab930] dev_suspend+0x360/0x390
[   48.707725] [c000200e46c07b30] [c000000000aac110] ctl_ioctl+0x200/0x5a0
[   48.707773] [c000200e46c07d20] [c000000000aac4d0] dm_ctl_ioctl+0x20/0x30
[   48.707822] [c000200e46c07d40] [c0000000003ef9f4] do_vfs_ioctl+0xd4/0xa00
[   48.707870] [c000200e46c07de0] [c0000000003f03e4] SyS_ioctl+0xc4/0x130
[   48.707920] [c000200e46c07e30] [c00000000000b184] system_call+0x58/0x6c
GPR24: c000200e585e3a01 26eed6a1145b76a7 c000000000b2eb6c c000000ff901ee00
^^^^^^^^^^^^
GPR28: ffffffffffffffff 00000000014000c0 c000200e585e3a01 c000000ff901ee00
^^^^^^^^^^^^  ^^^^^^^^^^^^

appears to be a case of  kmem cache corruption again.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1762844

Title:
  ISST-LTE:KVM:Ubuntu1804:BostonLC:boslcp3: Host crashed & enters into
  xmon after moving to 4.15.0-15.16 kernel

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1762844/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to