Hi Harish, excuse me, but you are using a very old and in between outdated kernel with 4.15.0-29
With bionic/18.04 we are today at 4.15.0.72: linux-generic | 4.15.0.20.23 | bionic | s390x (initial kernel, already superseded) linux-generic | 4.15.0.72.74 | bionic-security | s390x <== linux-generic | 4.15.0.72.74 | bionic-updates | s390x <== linux-generic | 4.15.0.73.75 | bionic-proposed | s390x (and even have 4.15.0.73 in proposed). I also looked up the 3 commit IDs - you've mentioned above - in bionic's git tree, and they are all in since a couple of kernel updates: 1) "Disable preemption after queueing stopper threads" Ubuntu-4.15.0-48.51 Ubuntu-4.15.0-49.52 Ubuntu-4.15.0-49.53 Ubuntu-4.15.0-50.54 Ubuntu-4.15.0-51.55 Ubuntu-4.15.0-52.56 Ubuntu-4.15.0-54.58 Ubuntu-4.15.0-55.60 Ubuntu-4.15.0-56.62 Ubuntu-4.15.0-57.63 Ubuntu-4.15.0-58.64 Ubuntu-4.15.0-59.66 Ubuntu-4.15.0-60.67 Ubuntu-4.15.0-62.69 Ubuntu-4.15.0-64.73 Ubuntu-4.15.0-65.74 Ubuntu-4.15.0-66.75 Ubuntu-4.15.0-67.76 Ubuntu-4.15.0-68.77 Ubuntu-4.15.0-69.78 Ubuntu-4.15.0-70.79 Ubuntu-4.15.0-71.80 Ubuntu-4.15.0-72.81 2) "Disable preemption when waking two stopper threads" Ubuntu-4.15.0-46.49 Ubuntu-4.15.0-47.50 Ubuntu-4.15.0-48.51 Ubuntu-4.15.0-49.52 Ubuntu-4.15.0-49.53 Ubuntu-4.15.0-50.54 Ubuntu-4.15.0-51.55 Ubuntu-4.15.0-52.56 Ubuntu-4.15.0-54.58 Ubuntu-4.15.0-55.60 Ubuntu-4.15.0-56.62 Ubuntu-4.15.0-57.63 Ubuntu-4.15.0-58.64 Ubuntu-4.15.0-59.66 Ubuntu-4.15.0-60.67 Ubuntu-4.15.0-62.69 Ubuntu-4.15.0-64.73 Ubuntu-4.15.0-65.74 Ubuntu-4.15.0-66.75 Ubuntu-4.15.0-67.76 Ubuntu-4.15.0-68.77 Ubuntu-4.15.0-69.78 Ubuntu-4.15.0-70.79 Ubuntu-4.15.0-71.80 Ubuntu-4.15.0-72.81 3) "sched: Fix migrate_swap() vs. active_balance() deadlock" Ubuntu-4.15.0-37.40 Ubuntu-4.15.0-38.41 Ubuntu-4.15.0-39.42 Ubuntu-4.15.0-40.43 Ubuntu-4.15.0-42.45 Ubuntu-4.15.0-43.46 Ubuntu-4.15.0-44.47 Ubuntu-4.15.0-45.48 Ubuntu-4.15.0-46.49 Ubuntu-4.15.0-47.50 Ubuntu-4.15.0-48.51 Ubuntu-4.15.0-49.52 Ubuntu-4.15.0-49.53 Ubuntu-4.15.0-50.54 Ubuntu-4.15.0-51.55 Ubuntu-4.15.0-52.56 Ubuntu-4.15.0-54.58 Ubuntu-4.15.0-55.60 Ubuntu-4.15.0-56.62 Ubuntu-4.15.0-57.63 Ubuntu-4.15.0-58.64 Ubuntu-4.15.0-59.66 Ubuntu-4.15.0-60.67 Ubuntu-4.15.0-62.69 Ubuntu-4.15.0-64.73 Ubuntu-4.15.0-65.74 Ubuntu-4.15.0-66.75 Ubuntu-4.15.0-67.76 Ubuntu-4.15.0-68.77 Ubuntu-4.15.0-69.78 Ubuntu-4.15.0-70.79 Ubuntu-4.15.0-71.80 Ubuntu-4.15.0-72.81 So please update your system to the very latest bionic kernel (as of today: 4.15.0-72) and retest. It's highly likely that the issue you've mentioned is already solved! ** Changed in: linux (Ubuntu) Status: New => Incomplete ** Changed in: ubuntu-power-systems Status: New => Incomplete -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1855679 Title: Rcu stalls and Soft-lockups observed on stressing Ubuntu 18 04 1 Status in The Ubuntu-power-systems project: Incomplete Status in linux package in Ubuntu: Incomplete Bug description: == Comment: #0 - Harish Sriram <hasri...@in.ibm.com> - 2018-07-31 03:50:07 == --Problem Description-- Rcu stalls and Soft-lockups observed on stressing Ubuntu 18 04 1 Contact Information = hasri...@in.ibm.com ---Issue observed--- [ 1196.813220] INFO: rcu_sched detected stalls on CPUs/tasks: [ 1196.813241] 0-....: (19 ticks this GP) idle=966/140000000000000/0 softirq=11580/11580 fqs=1552 [ 1196.813249] (detected by 24, t=5252 jiffies, g=11722, c=11721, q=1061088) [ 1196.813282] Task dump for CPU 0: [ 1196.813285] stress-ng-dev R running task 0 46323 33635 0x00042004 [ 1196.813294] Call Trace: [ 1196.813310] [c000002c75ad7b10] [c0000000018bd940] log_first_seq+0x0/0x8 (unreliable) [ 1198.508930] kauditd_printk_skb: 3 callbacks suppressed [ 1198.508938] audit: type=1400 audit(1533020002.449:312): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/bin/pulseaudio-eg" pid=12813 comm="stress-ng-appar" [ 1198.508954] audit: type=1400 audit(1533020002.449:313): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/bin/pulseaudio-eg///usr/lib/pulseaudio/pulse/gconf-helper" pid=12813 comm="stress-ng-appar" [ 1199.361719] INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-... 145-... 159-... } 5489 jiffies s: 173 root: 0x201/. [ 1199.361742] blocking rcu_node structures: l=1:0-15:0x1/. l=1:144-159:0x8002/. [ 1199.361749] Task dump for CPU 0: [ 1199.361752] stress-ng-dev R running task 0 46323 33635 0x00042004 [ 1199.361757] Call Trace: [ 1199.361769] [c000002c75ad7b10] [c0000000018bd940] log_first_seq+0x0/0x8 (unreliable) [ 1199.361777] Task dump for CPU 145: [ 1199.361779] migration/145 R running task 0 883 2 0x00000804 [ 1199.361783] Call Trace: [ 1199.361787] [c000002ff0f5fa40] [c000002ff0f5fb00] 0xc000002ff0f5fb00 (unreliable) [ 1199.361791] Task dump for CPU 159: [ 1199.361792] migration/159 R running task 0 967 2 0x00000804 [ 1199.361796] Call Trace: [ 1199.361799] [c000002d78a47a40] [c000002d78a47b00] 0xc000002d78a47b00 (unreliable) [ 1199.787698] audit: type=1400 audit(1533020003.985:314): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/bin/pulseaudio-eg" pid=12813 comm="stress-ng-appar" [ 1200.781159] watchdog: BUG: soft lockup - CPU#145 stuck for 23s! [migration/145:883] [ 1200.781163] Modules linked in: snd_seq snd_seq_device snd_timer snd soundcore kvm_hv kvm_pr kvm camellia_generic cast6_generic cast_common serpent_generic vhost_vsock vmw_vsock_virtio_transport_common vsock twofish_generic twofish_common vhost_net vhost tap hci_vhci bluetooth ecdh_generic lrw userio algif_skcipher binfmt_misc tgr192 wp512 rmd320 unix_diag sctp rmd256 rmd160 rmd128 md4 dccp_ipv4 algif_hash dccp af_alg powernv_op_panel ipmi_powernv ipmi_devintf ipmi_msghandler uio_pdrv_genirq leds_powernv uio ibmpowernv powernv_rng vmx_crypto sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 ses enclosure scsi_transport_sas btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor [ 1200.781321] raid6_pq libcrc32c raid1 raid0 multipath linear uas usb_storage crct10dif_vpmsum crc32c_vpmsum tg3 ipr [ 1200.781353] CPU: 145 PID: 883 Comm: migration/145 Not tainted 4.15.0-29-generic #31-Ubuntu [ 1200.781359] NIP: c000000000206594 LR: c00000000020699c CTR: c000000000206470 [ 1200.781364] REGS: c000002ff0f5f9e0 TRAP: 0901 Not tainted (4.15.0-29-generic) [ 1200.781366] MSR: 9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE,TM[E]> CR: 28002222 XER: 20000000 [ 1200.781392] CFAR: c0000000002065a4 SOFTE: 1 GPR00: c00000000020699c c000002ff0f5fc60 c0000000016eaf00 0000000000000000 GPR04: 0000000000000001 0000002ffbf90000 0000009dde82957a 0000000000000000 GPR08: c00000000fae3b00 0000000000000001 c000000000d432f8 0000000000000bdf GPR12: 0000000000000000 c00000000fae3b00 [ 1200.781453] NIP [c000000000206594] multi_cpu_stop+0x124/0x1f0 [ 1200.781461] LR [c00000000020699c] cpu_stopper_thread+0xfc/0x1f0 [ 1200.781462] Call Trace: [ 1200.781475] [c000002ff0f5fc60] [c000002ff0f5fd40] 0xc000002ff0f5fd40 (unreliable) [ 1200.781487] [c000002ff0f5fcb0] [c00000000020699c] cpu_stopper_thread+0xfc/0x1f0 [ 1200.781503] [c000002ff0f5fd60] [c000000000143ae0] smpboot_thread_fn+0x250/0x290 [ 1200.781510] [c000002ff0f5fdc0] [c00000000013d728] kthread+0x1a8/0x1b0 [ 1200.781522] [c000002ff0f5fe30] [c00000000000b658] ret_from_kernel_thread+0x5c/0x84 [ 1200.781525] Instruction dump: [ 1200.781531] 409e001c 813d0020 815d0010 39290001 915e0000 7c2004ac 913d0020 2b9f0004 [ 1200.781551] 419e003c 7fe9fb78 7c210b78 7c421378 <83fd0020> 7f9f4840 409eff74 2b890001 [ 1200.905158] watchdog: BUG: soft lockup - CPU#159 stuck for 22s! [migration/159:967] [ 1200.905161] Modules linked in: snd_seq snd_seq_device snd_timer snd soundcore kvm_hv kvm_pr kvm camellia_generic cast6_generic cast_common serpent_generic vhost_vsock vmw_vsock_virtio_transport_common vsock twofish_generic twofish_common vhost_net vhost tap hci_vhci bluetooth ecdh_generic lrw userio algif_skcipher binfmt_misc tgr192 wp512 rmd320 unix_diag sctp rmd256 rmd160 rmd128 md4 dccp_ipv4 algif_hash dccp af_alg powernv_op_panel ipmi_powernv ipmi_devintf ipmi_msghandler uio_pdrv_genirq leds_powernv uio ibmpowernv powernv_rng vmx_crypto sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 ses enclosure scsi_transport_sas btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor [ 1200.905290] raid6_pq libcrc32c raid1 raid0 multipath linear uas usb_storage crct10dif_vpmsum crc32c_vpmsum tg3 ipr [ 1200.905316] CPU: 159 PID: 967 Comm: migration/159 Tainted: G L 4.15.0-29-generic #31-Ubuntu [ 1200.905320] NIP: c000000000206594 LR: c00000000020699c CTR: c000000000206470 [ 1200.905326] REGS: c000002d78a479e0 TRAP: 0901 Tainted: G L (4.15.0-29-generic) [ 1200.905327] MSR: 900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 28002822 XER: 20000000 [ 1200.905345] CFAR: c0000000002065a4 SOFTE: 1 GPR00: c00000000020699c c000002d78a47c60 c0000000016eaf00 0000000000000000 GPR04: 0000000000000001 0000002ffc310000 0000009dd8be8a16 0000000000000000 GPR08: c00000000faed500 0000000000000001 c000000000d432f8 0000000000000b97 GPR12: 0000000000000000 c00000000faed500 [ 1200.905383] NIP [c000000000206594] multi_cpu_stop+0x124/0x1f0 [ 1200.905387] LR [c00000000020699c] cpu_stopper_thread+0xfc/0x1f0 [ 1200.905391] Call Trace: [ 1200.905398] [c000002d78a47c60] [c000002d78a47d40] 0xc000002d78a47d40 (unreliable) [ 1200.905405] [c000002d78a47cb0] [c00000000020699c] cpu_stopper_thread+0xfc/0x1f0 [ 1200.905413] [c000002d78a47d60] [c000000000143ae0] smpboot_thread_fn+0x250/0x290 [ 1200.905418] [c000002d78a47dc0] [c00000000013d728] kthread+0x1a8/0x1b0 [ 1200.905426] [c000002d78a47e30] [c00000000000b658] ret_from_kernel_thread+0x5c/0x84 [ 1200.905429] Instruction dump: [ 1200.905433] 409e001c 813d0020 815d0010 39290001 915e0000 7c2004ac 913d0020 2b9f0004 [ 1200.905445] 419e003c 7fe9fb78 7c210b78 7c421378 <83fd0020> 7f9f4840 409eff74 2b890001 ---uname output--- # uname -a Linux lep8d 4.15.0-29-generic #31-Ubuntu SMP Tue Jul 17 15:37:15 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux Machine Type = Power 8 BML/Tuleta ----Additional Info----- rcu stalls and soft lockups leads to Hard LOCKUPs but is cpu becomes unstuck after hard lockup. dmesg is attached. sosreport will be attached. Reproducible : 90% ---Steps to Reproduce--- 1. wget https://github.com/ColinIanKing/stress-ng/archive/master.zip 2. unzip master.zip; cd stress-ng-master; 3. make; make install; 4. Run the following command multiple times stress-ng --all <nr_cpus> --vm-bytes 80% --aggressive --maximize --oomable --timeout 300 --verify --syslog --metrics --times ---Expected--- Test should not cause any lockup or crash. == Comment: #1 - Harish Sriram <hasri...@in.ibm.com> - 2018-07-31 03:50:49 == == Comment: #5 - SRIKAR DRONAMRAJU <srikar.dronamr...@in.ibm.com> - 2018-08-15 02:16:49 == (unreliable) > [ 1199.361777] Task dump for CPU 145: > [ 1199.361779] migration/145 R running task 0 883 2 > 0x00000804 > [ 1199.361783] Call Trace: > [ 1199.361787] [c000002ff0f5fa40] [c000002ff0f5fb00] 0xc000002ff0f5fb00 > (unreliable) > [ 1199.361791] Task dump for CPU 159: > [ 1199.361792] migration/159 R running task 0 967 2 > 0x00000804 > [ 1199.361796] Call Trace: > [ 1199.361799] [c000002d78a47a40] [c000002d78a47b00] 0xc000002d78a47b00 > (unreliable) > [ 1199.787698] audit: type=1400 audit(1533020003.985:314): apparmor="STATUS" > operation="profile_replace" profile="unconfined" > name="/usr/bin/pulseaudio-eg" pid=12813 comm="stress-ng-appar" > [ 1200.781159] watchdog: BUG: soft lockup - CPU#145 stuck for 23s! > [migration/145:883] > [ 1200.781163] Modules linked in: snd_seq snd_seq_device snd_timer snd > soundcore kvm_hv kvm_pr kvm camellia_generic cast6_generic cast_common > serpent_generic vhost_vsock vmw_vsock_virtio_transport_common vsock > twofish_generic twofish_common vhost_net vhost tap hci_vhci bluetooth > ecdh_generic lrw userio algif_skcipher binfmt_misc tgr192 wp512 rmd320 > unix_diag sctp rmd256 rmd160 rmd128 md4 dccp_ipv4 algif_hash dccp af_alg > powernv_op_panel ipmi_powernv ipmi_devintf ipmi_msghandler uio_pdrv_genirq > leds_powernv uio ibmpowernv powernv_rng vmx_crypto sch_fq_codel ib_iser > rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi > scsi_transport_iscsi ip_tables x_tables autofs4 ses enclosure > scsi_transport_sas btrfs zstd_compress raid10 raid456 async_raid6_recov > async_memcpy async_pq async_xor async_tx xor > [ 1200.781321] raid6_pq libcrc32c raid1 raid0 multipath linear uas > usb_storage crct10dif_vpmsum crc32c_vpmsum tg3 ipr > [ 1200.781353] CPU: 145 PID: 883 Comm: migration/145 Not tainted > 4.15.0-29-generic #31-Ubuntu > [ 1200.781359] NIP: c000000000206594 LR: c00000000020699c CTR: > c000000000206470 > [ 1200.781364] REGS: c000002ff0f5f9e0 TRAP: 0901 Not tainted > (4.15.0-29-generic) > [ 1200.781366] MSR: 9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE,TM[E]> CR: > 28002222 XER: 20000000 > [ 1200.781392] CFAR: c0000000002065a4 SOFTE: 1 > GPR00: c00000000020699c c000002ff0f5fc60 c0000000016eaf00 > 0000000000000000 > GPR04: 0000000000000001 0000002ffbf90000 0000009dde82957a > 0000000000000000 > GPR08: c00000000fae3b00 0000000000000001 c000000000d432f8 > 0000000000000bdf > GPR12: 0000000000000000 c00000000fae3b00 > [ 1200.781453] NIP [c000000000206594] multi_cpu_stop+0x124/0x1f0 > [ 1200.781461] LR [c00000000020699c] cpu_stopper_thread+0xfc/0x1f0 > [ 1200.781462] Call Trace: > [ 1200.781475] [c000002ff0f5fc60] [c000002ff0f5fd40] 0xc000002ff0f5fd40 > (unreliable) > [ 1200.781487] [c000002ff0f5fcb0] [c00000000020699c] > cpu_stopper_thread+0xfc/0x1f0 > [ 1200.781503] [c000002ff0f5fd60] [c000000000143ae0] > smpboot_thread_fn+0x250/0x290 > [ 1200.781510] [c000002ff0f5fdc0] [c00000000013d728] kthread+0x1a8/0x1b0 > [ 1200.781522] [c000002ff0f5fe30] [c00000000000b658] > ret_from_kernel_thread+0x5c/0x84 > [ 1200.781525] Instruction dump: > [ 1200.781531] 409e001c 813d0020 815d0010 39290001 915e0000 7c2004ac > 913d0020 2b9f0004 > [ 1200.781551] 419e003c 7fe9fb78 7c210b78 7c421378 <83fd0020> 7f9f4840 > 409eff74 2b890001 2610e88 stop_machine: Disable preemption after queueing stopper threads 9fb8d5d stop_machine: Disable preemption when waking two stopper threads 0b26351 stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock These 3 commit are missing that could be the reason we are seeing these traces. == Comment: #13 - Harish Sriram <hasri...@in.ibm.com> - 2018-08-16 10:29:21 == To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1855679/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp