On 2024-03-08 19:08:50 Fri, Michael Ellerman wrote: > Aneesh Kumar K V <aneesh.ku...@linux.ibm.com> writes: > > On 3/7/24 5:13 PM, Michael Ellerman wrote: > >> Mahesh Salgaonkar <mah...@linux.ibm.com> writes: > >>> nmi_enter()/nmi_exit() touches per cpu variables which can lead to kernel > >>> crash when invoked during real mode interrupt handling (e.g. early HMI/MCE > >>> interrupt handler) if percpu allocation comes from vmalloc area. > >>> > >>> Early HMI/MCE handlers are called through DEFINE_INTERRUPT_HANDLER_NMI() > >>> wrapper which invokes nmi_enter/nmi_exit calls. We don't see any issue > >>> when > >>> percpu allocation is from the embedded first chunk. However with > >>> CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK enabled there are chances where > >>> percpu > >>> allocation can come from the vmalloc area. > >>> > >>> With kernel command line "percpu_alloc=page" we can force percpu > >>> allocation > >>> to come from vmalloc area and can see kernel crash in machine_check_early: > >>> > >>> [ 1.215714] NIP [c000000000e49eb4] rcu_nmi_enter+0x24/0x110 > >>> [ 1.215717] LR [c0000000000461a0] machine_check_early+0xf0/0x2c0 > >>> [ 1.215719] --- interrupt: 200 > >>> [ 1.215720] [c000000fffd73180] [0000000000000000] 0x0 (unreliable) > >>> [ 1.215722] [c000000fffd731b0] [0000000000000000] 0x0 > >>> [ 1.215724] [c000000fffd73210] [c000000000008364] > >>> machine_check_early_common+0x134/0x1f8 > >>> > >>> Fix this by avoiding use of nmi_enter()/nmi_exit() in real mode if percpu > >>> first chunk is not embedded. > >> > >> My system (powernv) doesn't even boot with percpu_alloc=page. > > > > > > Can you share the crash details? > > Yes but it's not pretty :) > > [ 1.725257][ T714] systemd-journald[714]: Collecting audit messages is > disabled. > [ 1.729401][ T1] systemd[1]: Finished > systemd-tmpfiles-setup-dev.service - Create Static Device Nodes in /dev. > [^[[0;32m OK ^[[0m] Finished ^[[0;1;39msystemd-tmpfiles-…reate Static > Device Nodes in /dev. > [ 1.773902][ C22] Disabling lock debugging due to kernel taint > [ 1.773905][ C23] Oops: Machine check, sig: 7 [#1] > [ 1.773911][ C23] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA > PowerNV > [ 1.773916][ C23] Modules linked in: > [ 1.773920][ C23] CPU: 23 PID: 0 Comm: swapper/23 Tainted: G M > 6.8.0-rc7-02500-g23515c370cbb #1 > [ 1.773924][ C23] Hardware name: 8335-GTH POWER9 0x4e1202 > opal:skiboot-v6.5.3-35-g1851b2a06 PowerNV > [ 1.773926][ C23] NIP: 0000000000000000 LR: 0000000000000000 CTR: > 0000000000000000 > [ 1.773929][ C23] REGS: c000000fffa6ef50 TRAP: 0000 Tainted: G M > (6.8.0-rc7-02500-g23515c370cbb) > [ 1.773932][ C23] MSR: 0000000000000000 <> CR: 00000000 XER: > 00000000 > [ 1.773937][ C23] CFAR: 0000000000000000 IRQMASK: 3 > [ 1.773937][ C23] GPR00: 0000000000000000 c000000fffa6efe0 > c000000fffa6efb0 0000000000000000 > [ 1.773937][ C23] GPR04: c00000000003d8c0 c000000001f5f000 > 0000000000000000 0000000000000103 > [ 1.773937][ C23] GPR08: 0000000000000003 653a0d962a590300 > 0000000000000000 0000000000000000 > [ 1.773937][ C23] GPR12: c000000fffa6f280 0000000000000000 > c0000000000084a4 0000000000000000 > [ 1.773937][ C23] GPR16: 0000000053474552 0000000000000000 > c00000000003d8c0 c000000fffa6f280 > [ 1.773937][ C23] GPR20: c000000001f5f000 c000000fffa6f340 > c000000fffa6f2e8 0000000000000000 > [ 1.773937][ C23] GPR24: 0007fffffecf0000 c0000000065bbb80 > 0000000000550102 c000000002172b20 > [ 1.773937][ C23] GPR28: 0000000000000000 0000000053474552 > 0000000000000000 c000000ffffc6d80 > [ 1.773982][ C23] NIP [0000000000000000] 0x0 > [ 1.773988][ C23] LR [0000000000000000] 0x0 > [ 1.773990][ C23] Call Trace: > [ 1.773991][ C23] [c000000fffa6efe0] [c000000001f5f000] > .TOC.+0x0/0xa1000 (unreliable) > [ 1.773999][ C23] Code: XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX > XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX > XXXXXXXX XXXXXXXX XXXXXXXX > [ 1.774021][ C23] ---[ end trace 0000000000000000 ]--- > > Something has gone badly wrong. > > That was a test kernel with some other commits, but nothing that should > cause that. Removing percpu_alloc=page fix it.
So, when I try this without my patch "Avoid nmi_enter/nmi_exit in real mode interrupt", I see this getting recreated. However, I was not able to recrate this even once with my changes. Are you able to see this crash with my patch ? Thanks, -Mahesh.