SRU request submitted:
https://lists.ubuntu.com/archives/kernel-team/2018-May/092540.html

** Description changed:

+ == SRU Justification ==
+ IBM reports that this bug occurs with stop4 which results in soft lockups/rcu 
stalls.
+ This is a kernel synchronization issue leading to a dead lock.
+ 
+ This bug was introduced by commit 7bc54b652f13 in v4.8-rc1.  This
+ regression is fixed by mainline commit c0f7f5b6c6910.
+ 
+ == Fix ==
+ c0f7f5b6c6910 ("cpufreq: powernv: Fix hardlockup due to synchronous smp_call 
in timer interrupt")
+ 
+ == Regression Potential ==
+ Low. Fixes current regression.  Cc'd to upstream stable, so it has had
+ additon upstream review.
+ 
+ == Test Case ==
+ A test kernel was built with this patch and tested by the original bug 
reporter.
+ The bug reporter states the test kernel resolved the bug.
+ 
+ 
  Recently we discovered this bug occurs just alone with stop4 which
  results in soft lockups/rcu stalls.
  
  ```
  root@ltc-boston125:~# [15523.619395] systemd[1]: systemd-journald.service: 
Processes still around after final SIGKILL. Entering failed mode.
  [15523.619508] systemd[1]: systemd-journald.service: Failed with result 
'timeout'.
  [15523.619769] systemd[1]: Failed to start Journal Service.
  [15523.620618] systemd[1]: systemd-journald.service: Service has no hold-off 
time, scheduling restart.
  [15523.620774] systemd[1]: systemd-journald.service: Scheduled restart job, 
restart counter is at 21.
  [15523.621462] systemd[1]: Stopped Journal Service.
  [15523.621635] systemd[1]: systemd-journald.service: Found left-over process 
1561 (systemd-journal) in control group while starting unit. Ignoring.
  [15523.621756] systemd[1]: This usually indicates unclean termination of a 
previous run, or service implementation deficiencies.
  [15523.621888] systemd[1]: systemd-journald.service: Found left-over process 
69060 (systemd-journal) in control group while starting unit. Ignoring.
  [15523.622029] systemd[1]: This usually indica[15541.629904] INFO: rcu_sched 
self-detected stall on CPU
- [15541.629958]        60-....: (2 GPs behind) idle=146/140000000000002/0 
softirq=300022/300022 fqs=999069 
+ [15541.629958]        60-....: (2 GPs behind) idle=146/140000000000002/0 
softirq=300022/300022 fqs=999069
  [15541.630046]         (t=2415546 jiffies g=184827 c=184826 q=57111)
  [15541.630101] NMI backtrace for cpu 60
  [15541.630135] CPU: 60 PID: 4810 Comm: tlbie_test Tainted: G             L   
4.15.0-15-generic #16-Ubuntu
  [15541.630207] Call Trace:
  [15541.630232] [c000201a1da96b00] [c000000000ceb35c] dump_stack+0xb0/0xf4 
(unreliable)
  [15541.630298] [c000201a1da96b40] [c000000000cf4d48] 
nmi_cpu_backtrace+0x1f8/0x200
  [15541.630363] [c000201a1da96bd0] [c000000000cf4ee8] 
nmi_trigger_cpumask_backtrace+0x198/0x1f0
  [15541.630429] [c000201a1da96c60] [c00000000002f2d8] 
arch_trigger_cpumask_backtrace+0x28/0x40
  [15541.630495] [c000201a1da96c80] [c0000000001a913c] 
rcu_dump_cpu_stacks+0xf4/0x158
  [15541.630560] [c000201a1da96cd0] [c0000000001a81e8] 
rcu_check_callbacks+0x8e8/0xb40
  [15541.630625] [c000201a1da96e00] [c0000000001b64a8] 
update_process_times+0x48/0x90
  [15541.630689] [c000201a1da96e30] [c0000000001ce1f4] 
tick_sched_handle.isra.5+0x34/0xd0
  [15541.630753] [c000201a1da96e60] [c0000000001ce2f0] 
tick_sched_timer+0x60/0xe0
  [15541.630818] [c000201a1da96ea0] [c0000000001b7054] 
__hrtimer_run_queues+0x144/0x370
  [15541.630883] [c000201a1da96f20] [c0000000001b7fac] 
hrtimer_interrupt+0xfc/0x350
  [15541.630948] [c000201a1da96ff0] [c0000000000248f0] 
__timer_interrupt+0x90/0x260
  [15541.631013] [c000201a1da97040] [c000000000024d08] timer_interrupt+0x98/0xe0
  [15541.631069] [c000201a1da97070] [c000000000009014] 
decrementer_common+0x114/0x120
  [15541.631135] --- interrupt: 901 at smp_call_function_single+0x134/0x180
  [15541.631135]     LR = smp_call_function_single+0x110/0x180
  [15541.631230] [c000201a1da973d0] [c0000000001d55e0] 
smp_call_function_any+0x180/0x250
  [15541.631294] [c000201a1da97430] [c000000000acd3e8] 
gpstate_timer_handler+0x1e8/0x580
  [15541.631359] [c000201a1da974e0] [c0000000001b46b0] call_timer_fn+0x50/0x1c0
  [15541.631433] [c000201a1da97560] [c0000000001b4958] expire_timers+0x138/0x1f0
  [15541.631488] [c000201a1da975d0] [c0000000001b4bf8] 
run_timer_softirq+0x1e8/0x270
  [15541.631553] [c000201a1da97670] [c000000000d0d6c8] __do_softirq+0x158/0x3e4
  [15541.631608] [c000201a1da97750] [c000000000114be8] irq_exit+0xe8/0x120
  [15541.631663] [c000201a1da97770] [c000000000024d0c] timer_interrupt+0x9c/0xe0
  [15541.631718] [c000201a1da977a0] [c000000000009014] 
decrementer_common+0x114/0x120
  [15541.631784] --- interrupt: 901 at smp_call_function_many+0x330/0x450
  [15541.631784]     LR = smp_call_function_many+0x324/0x450
  [15541.631879] [c000201a1da97b00] [c000000000075f18] pmdp_invalidate+0x98/0xe0
  [15541.631935] [c000201a1da97b30] [c0000000003a1120] 
change_huge_pmd+0xe0/0x270
  [15541.632000] [c000201a1da97ba0] [c000000000349278] 
change_protection_range+0xb88/0xe40
  [15541.632065] [c000201a1da97cf0] [c0000000003496c0] 
mprotect_fixup+0x140/0x340
  [15541.632129] [c000201a1da97db0] [c000000000349a74] SyS_mprotect+0x1b4/0x350
  [15541.632185] [c000201a1da97e30] [c00000000000b184] system_call+0x58/0x6c
  [15579.001651] watchdog: BUG: soft lockup - CPU#52 stuck for 23s! [grep:69263]
  [15579.001738] Modules linked in: vhost_net vhost tap xt_CHECKSUM 
iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 
nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT 
nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter 
ip6_tables iptable_filter devlink input_leds joydev mac_hid idt_89hpesx ofpart 
cmdlinepart powernv_flash ipmi_powernv ipmi_devintf opal_prd mtd 
ipmi_msghandler ibmpowernv at24 uio_pdrv_genirq uio vmx_crypto kvm_hv kvm 
sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp 
libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress 
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor 
raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure ast
  [15579.002363]  i2c_algo_bit hid_generic ttm drm_kms_helper mpt3sas 
syscopyarea sysfillrect usbhid sysimgblt fb_sys_fops hid raid_class 
crct10dif_vpmsum crc32c_vpmsum drm i40e aacraid scsi_transport_sas
  [15579.002524] CPU: 52 PID: 69263 Comm: grep Tainted: G             L   
4.15.0-15-generic #16-Ubuntu
  [15579.002598] NIP:  c0000000001d5368 LR: c0000000001d5340 CTR: 
c000000000acc7f0
  [15579.002664] REGS: c000003e84eff7e0 TRAP: 0901   Tainted: G             L   
 (4.15.0-15-generic)
  [15579.002735] MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 48044222 
 XER: 00000000
- [15579.002810] CFAR: c01721ed8 
- [15579.002810] GPR08: c000000001721ed8 0000000000000001 c009e006592e0960 
0000000000000000 
- [15579.002810] GPR12: c000000000acc7f0 c00000000faa3c00 
+ [15579.002810] CFAR: c01721ed8
+ [15579.002810] GPR08: c000000001721ed8 0000000000000001 c009e006592e0960 
0000000000000000
+ [15579.002810] GPR12: c000000000acc7f0 c00000000faa3c00
  [15579.003084] NIP [c0000000001d5368] smp_call_function_single+0x138/0x180
  [15579.003139] LR [c0000000001d5340] smp_call_function_single+0x110/0x180
  [15579.003191] Call Trace:
  [15579.003217] [c000003e84effa60] [c0000000001d5340] 
smp_call_function_single+0x110/0x180 (unreliable)
  [15579.003298] [c000003e84effad0] [c0000000001d55e0] 
smp_call_function_any+0x180/0x250
  [15579.003381] [c000003e84effb30] [c000000000acc840] 
powernv_cpufreq_get+0x50/0x70
  [15579.003447] [c000003e84effb60] [c000000000ac2b8c] __cpufreq_get+0x5c/0x140
  [15579.003503] [c000003e84effba0] [c000000000ac2d18] cpufreq_get+0xa8/0xb0
  [15579.003560] [c000003e84effbe0] [c00000000009da50] 
pnv_get_proc_freq+0x20/0x50
  [15579.003625] [c000003e84effc00] [c0000000000283bc] show_cpuinfo+0x11c/0x400
  [15579.003680] [c000003e84effca0] [c00000000040c738] seq_read+0x138/0x610
  [15579.003737] [c000003e84effd40] [c00000000047fa38] proc_reg_read+0x88/0xd0
  [15579.003794] [c000003e84effd70] [c0000000003d293c] __vfs_read+0x3c/0x70
  [15579.003849] [c000003e84effd90] [c0000000003d2a2c] vfs_read+0xbc/0x1b0
  [15579.003905] [c000003e84effde0] [c0000000003d3028] SyS_read+0x68/0x110
  [15579.003962] [c000003e84effe30] [c00000000000b184] system_call+0x58/0x6c
  [15579.004016] Instruction dump:
- [15579.004051] 7fe4fb78 4bfffd4d 813f0018 71290001 4182002c 48000014 60000000 
60000000 
- [15579.004121] 60000000 60420000 7c210b78 7c421378 <813f0018> 71290001 
4082fff0 7c2004ac 
+ [15579.004051] 7fe4fb78 4bfffd4d 813f0018 71290001 4182002c 48000014 60000000 
60000000
+ [15579.004121] 60000000 60420000 7c210b78 7c421378 <813f0018> 71290001 
4082fff0 7c2004ac
  [15604.648202] INFO: rcu_sched self-detected stall on CPU
- [15604.648260]        60-....: (2 GPs behind) idle=146/140000000000002/0 
softirq=300022/300022 fqs=1005652 
+ [15604.648260]        60-....: (2 GPs behind) idle=146/140000000000002/0 
softirq=300022/300022 fqs=1005652
  [15604.648332]         (t=2431300 jiffies g=184827 c=184826 q=57308)
  [15604.648385] NMI backtrace for cpu 60
  [15604.648419] CPU: 60 PID: 4810 Comm: tlbie_test Tainted: G             L   
4.15.0-15-generic #16-Ubuntu
  [15604.648491] Call Trace:
  [15604.648515] [c000201a1da96b00] [c000000000ceb35c] dump_stack+0xb0/0xf4 
(unreliable)
  [15604.648581] [c000201a1da96b40] [c000000000cf4d48] 
nmi_cpu_backtrace+0x1f8/0x200
  [15604.648647] [c000201a1da96bd0] [c000000000cf4ee8] 
nmi_trigger_cpumask_backtrace+0x198/0x1f0
  [15604.648728] [c000201a1da96c60] [c00000000002f2d8] 
arch_trigger_cpumask_backtrace+0x28/0x40
  [15604.648793] [c000201a1da96c80] [c0000000001a913c] 
rcu_dump_cpu_stacks+0xf4/0x158
  [15604.648858] [c000201a1da96cd0] [c0000000001a81e8] 
rcu_check_callbacks+0x8e8/0xb40
  [15604.648924] [c000201a1da96e00] [c0000000001b64a8] 
update_process_times+0x48/0x90
  [15604.648988] [c000201a1da96e30] [c0000000001ce1f4] 
tick_sched_handle.isra.5+0x34/0xd0
  [15604.649052] [c000201a1da96e60] [c0000000001ce2f0] 
tick_sched_timer+0x60/0xe0
  [15604.649118] [c000201a1da96ea0] [c0000000001b7054] 
__hrtimer_run_queues+0x144/0x370
  [15604.649183] [c000201a1da96f20] [c0000000001b7fac] 
hrtimer_interrupt+0xfc/0x350
  [15604.649248] [c000201a1da96ff0] [c0000000000248f0] 
__timer_interrupt+0x90/0x260
  [15604.649313] [c000201a1da97040] [c000000000024d08] timer_interrupt+0x98/0xe0
  [15604.649369] [c000201a1da97070] [c000000000009014] 
decrementer_common+0x114/0x120
  [15604.649435] --- interrupt: 901 at smp_call_function_single+0x138/0x180
  [15604.649435]     LR = smp_call_function_single+0x110/0x180
  [15604.649530] [c000201a1da973d0] [c0000000001d55e0] 
smp_call_function_any+0x180/0x250
  [15604.649595] [c000201a1da97430] [c000000000acd3e8] 
gpstate_timer_handler+0x1e8/0x580
  [15604.649660] [c000201a1da974e0] [c0000000001b46b0] call_timer_fn+0x50/0x1c0
  [15604.649715] [c000201a1da97560] [c0000000001b4958] expire_timers+0x138/0x1f0
  [15604.649770] [c000201a1da975d0] [c0000000001b4bf8] 
run_timer_softirq+0x1e8/0x270
  [15604.649835] [c000201a1da97670] [c000000000d0d6c8] __do_softirq+0x158/0x3e4
  [15604.649891] [c000201a1da97750] [c000000000114be8] irq_exit+0xe8/0x120
  [15604.649946] [c000201a1da97770] [c000000000024d0c] timer_interrupt+0x9c/0xe0
  [15604.650002] [c000201a1da977a0] [c000000000009014] 
decrementer_common+0x114/0x120
  [15604.650084] --- interrupt: 901 at smp_call_function_many+0x330/0x450
  [15604.650084]     LR = smp_call_function_many+0x324/0x450
  [15604.650179] [c000201a1da97b00] [c000000000075f18] pmdp_invalidate+0x98/0xe0
  [15604.650235] [c000201a1da97b30] [c0000000003a1120] 
change_huge_pmd+0xe0/0x270
  [15604.650301] [c000201a1da97ba0] [c000000000349278] 
change_protection_range+0xb88/0xe40
  [15604.650366] [c000201a1da97cf0] [c0000000003496c0] 
mprotect_fixup+0x140/0x340
  [15604.650430] [c000201a1da97db0] [c000000000349a74] SyS_mprotect+0x1b4/0x350
  [15604.650486] [c000201a1da97e30] [c00000000000b184] system_call+0x58/0x6c
  [15667.666494] INFO: rcu_sched self-detected stall on CPU
- [15667.666550]        60-....: (2 GPs behind) idle=146/140000000000002/0 
softirq=300022/300022 fqs=1012258 
+ [15667.666550]        60-....: (2 GPs behind) idle=146/140000000000002/0 
softirq=300022/300022 fqs=1012258
  [15667.666622]         (t=2447054 jiffies g=184827 c=184826 q=57457)
  [15667.666675] NMI backtrace for cpu 60
  [15667.666709] CPU: 60 PID: 4810 Comm: tlbie_test Tainted: G             L   
4.15.0-15-generic #16-Ubuntu
  [15667.666781] Call Trace:
  [15667.666805] [c000201a1da96b00] [c000000000ceb35c] dump_stack+0xb0/0xf4 
(unreliable)
  [15667.666871] [c000201a1da96b40] [c000000000cf4d48] 
nmi_cpu_backtrace+0x1f8/0x200
  [15667.666937] [c000201a1da96bd0] [c000000000cf4ee8] 
nmi_trigger_cpumask_backtrace+0x198/0x1f0
  [15667.667002] [c000201a1da96c60] [c00000000002f2d8] 
arch_trigger_cpumask_backtrace+0x28/0x40
  [15667.667086] [c000201a1da96c80] [c0000000001a913c] 
rcu_dump_cpu_stacks+0xf4/0x158
  [15667.667151] [c000201a1da96cd0] [c0000000001a81e8] 
rcu_check_callbacks+0x8e8/0xb40
  [15667.667216] [c000201a1da96e00] [c0000000001b64a8] 
update_process_times+0x48/0x90
  [15667.667280] [c000201a1da96e30] [c0000000001ce1f4] 
tick_sched_handle.isra.5+0x34/0xd0
  [15667.667344] [c000201a1da96e60] [c0000000001ce2f0] 
tick_sched_timer+0x60/0xe0
  [15667.667409] [c000201a1da96ea0] [c0000000001b7054] 
__hrtimer_run_queues+0x144/0x370
  [15667.667474] [c000201a1da96f20] [c0000000001b7fac] 
hrtimer_interrupt+0xfc/0x350
  [15667.667539] [c000201a1da96ff0] [c0000000000248f0] 
__timer_interrupt+0x90/0x260
  [15667.667604] [c000201a1da97040] [c000000000024d08] timer_interrupt+0x98/0xe0
  [15667.667660] [c000201a1da97070] [c000000000009014] 
decrementer_common+0x114/0x120
  [15667.667727] --- interrupt: 901 at smp_call_function_single+0x130/0x180
  [15667.667727]     LR = smp_call_function_single+0x110/0x180
  [15667.667821] [c000201a1da973d0] [c0000000001d55e0] 
smp_call_function_any+0x180/0x250
  [15667.667886] [c000201a1da97430] [c000000000acd3e8] 
gpstate_timer_handler+0x1e8/0x580
  [15667.667951] [c000201a1da974e0] [c0000000001b46b0] call_timer_fn+0x50/0x1c0
  [15667.668006] [c000201a1da97560] [c0000000001b4958] expire_timers+0x138/0x1f0
  [15667.668061] [c000201a1da975d0] [c0000000001b4bf8] 
run_timer_softirq+0x1e8/0x270
  [15667.668126] [c000201a1da97670] [c000000000d0d6c8] __do_softirq+0x158/0x3e4
  [15667.668181] [c000201a1da97750] [c000000000114be8] irq_exit+0xe8/0x120
  [15667.668236] [c000201a1da97770] [c000000000024d0c] timer_interrupt+0x9c/0xe0
  [15667.668292] [c000201a1da977a0] [c000000000009014] 
decrementer_common+0x114/0x120
  [15667.668358] --- interrupt: 901 at smp_call_function_many+0x330/0x450
  [15667.668358]     LR = smp_call_function_many+0x324/0x450
  [15667.668469] [c000201a1da97b00] [c000000000075f18] pmdp_invalidate+0x98/0xe0
  [15667.668524] [c000201a1da97b30] [c0000000003a1120] 
change_huge_pmd+0xe0/0x270
  [15667.668589] [c000201a1da97ba0] [c000000000349278] 
change_protection_range+0xb88/0xe40
  [15667.668654] [c000201a1da97cf0] [c0000000003496c0] 
mprotect_fixup+0x140/0x340
  [15667.668719] [c000201a1da97db0] [c000000000349a74] SyS_mprotect+0x1b4/0x350
  [15667.668775] [c000201a1da97e30] [c00000000000b184] system_call+0x58/0x6c
  
  ```
  
  Per feedback from Vaidy, this currently appears to NOT be a firmware
  problem. This seems to be a kernel synchronization issue leading to a
  dead lock.
  
  -------
  Fix identified by Shilpa as per Nick Piggin's recommendation.  Kernel fix is 
currently being tested.
  
-  -------
+  -------
  Fix upstream in 4.17-rc3
  
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.17-rc3&id=c0f7f5b6c69107ca92909512533e70258ee19188
  cpufreq: powernv: Fix hardlockup due to synchronous smp_call in timer 
interrupt
  
  Posted to stable as well.
  
  Mirroring to Launchpad for Canonical to pull in commit.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1768898

Title:
  smp_call_function_single/many core hangs with stop4 alone

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1768898/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to