We have been experiencing random softlockups with a P2040 (e500mc) CPU.
The rate of these occurring is very low.
The stack trace appears below:
watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [systemd:1]
CPU: 0 UID: 0 PID: 1 Comm: systemd Tainted: G O 6.12.85 #1
Tainted: [O]=OOT_MODULE
Hardware name: x930 e500mc 0x80230032 CoreNet Generic
NIP: 800f6e88 LR: 8001cae8 CTR: 80028e50
REGS: 81029c40 TRAP: 0900 Tainted: G O (6.12.85)
MSR: 00029002 <CE,EE,ME> CR: 48042842 XER: 20000000
GPR00: 8001cae8 81029d30 81080000 0000000c abc2db70 abc2b380 00000001
80db697c
GPR08: abc11b70 00000001 00000001 0000000f 48042842 10038020 100a0000
00000000
GPR16: 00000000 7fcff900 00000001 00000000 abc013c0 00000000 10013db0
77fb8ff0
GPR24: 00000001 ae3a1e10 81029f10 8001c790 81029dac 00000000 00000000
00000000
NIP [800f6e88] smp_call_function_many_cond+0x29c/0x4ac
LR [8001cae8] __flush_tlb_page+0xe4/0x108
Call Trace:
[81029d30] [00000008] 0x8 (unreliable)
[81029d90] [8001cae8] __flush_tlb_page+0xe4/0x108
[81029dd0] [80019c54] ptep_set_access_flags+0xcc/0x120
[81029df0] [8020e8bc] do_wp_page+0x160/0xe28
[81029e40] [80211308] handle_mm_fault+0x75c/0xe78
[81029ed0] [800190a0] do_page_fault+0x154/0x694
[81029f00] [8000091c] DataStorage+0x15c/0x180
I suspect that the issue is more than one CPU is running this code path at
the same time, and the two are deadlocked waiting for a response from each
other. My reasoning for is that the following patch fixes the issue:
diff --git a/kernel/smp.c b/kernel/smp.c
index fa6faf50fb43..d986b5075eb7 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -875,7 +875,7 @@ static void smp_call_function_many_cond(const struct
cpumask *mask,
csd_do_func(func, info, NULL);
local_irq_restore(flags);
}
-
+ flush_smp_call_function_queue();
if (run_remote && wait) {
for_each_cpu(cpu, cfd->cpumask) {
call_single_data_t *csd;
That is, after making a request to the other CPU(s), process our own list.
I suspect I shouldn't have to do this, and the problem lies elsewhere;
perhaps an interrupt not being enabled? Does anyone have any suggestions?