On Thu, 24 Aug 2023 at 06:42, Michael Ellerman <m...@ellerman.id.au> wrote: > > A thread started via eg. user_mode_thread() runs in the kernel to begin > with and then may later return to userspace. While it's running in the > kernel it has a pt_regs at the base of its kernel stack, but that > pt_regs is all zeroes. > > If the thread oopses in that state, it leads to an ugly stack trace with > a big block of zero GPRs, as reported by Joel: > > Kernel panic - not syncing: VFS: Unable to mount root fs on > unknown-block(0,0) > CPU: 0 PID: 1 Comm: swapper/0 Not tainted > 6.5.0-rc7-00004-gf7757129e3de-dirty #3 > Hardware name: IBM PowerNV (emulated by qemu) POWER9 0x4e1200 opal:v7.0 > PowerNV > Call Trace: > [c0000000036afb00] [c0000000010dd058] dump_stack_lvl+0x6c/0x9c (unreliable) > [c0000000036afb30] [c00000000013c524] panic+0x178/0x424 > [c0000000036afbd0] [c000000002005100] mount_root_generic+0x250/0x324 > [c0000000036afca0] [c0000000020057d0] prepare_namespace+0x2d4/0x344 > [c0000000036afd20] [c0000000020049c0] kernel_init_freeable+0x358/0x3ac > [c0000000036afdf0] [c0000000000111b0] kernel_init+0x30/0x1a0 > [c0000000036afe50] [c00000000000debc] ret_from_kernel_user_thread+0x14/0x1c > --- interrupt: 0 at 0x0 > NIP: 0000000000000000 LR: 0000000000000000 CTR: 0000000000000000 > REGS: c0000000036afe80 TRAP: 0000 Not tainted > (6.5.0-rc7-00004-gf7757129e3de-dirty) > MSR: 0000000000000000 <> CR: 00000000 XER: 00000000 > CFAR: 0000000000000000 IRQMASK: 0 > GPR00: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > GPR12: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > GPR28: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > NIP [0000000000000000] 0x0 > LR [0000000000000000] 0x0 > --- interrupt: 0 > > The all-zero pt_regs looks ugly and conveys no useful information, other > than its presence. So detect that case and just show the presence of the > frame by printing the interrupt marker, eg: > > Kernel panic - not syncing: VFS: Unable to mount root fs on > unknown-block(0,0) > CPU: 0 PID: 1 Comm: swapper/0 Not tainted > 6.5.0-rc3-00126-g18e9506562a0-dirty #301 > Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 > 0xf000005 of:SLOF,HEAD hv:linux,kvm pSeries > Call Trace: > [c000000003aabb00] [c000000001143db8] dump_stack_lvl+0x6c/0x9c (unreliable) > [c000000003aabb30] [c00000000014c624] panic+0x178/0x424 > [c000000003aabbd0] [c0000000020050fc] mount_root_generic+0x250/0x324 > [c000000003aabca0] [c0000000020057cc] prepare_namespace+0x2d4/0x344 > [c000000003aabd20] [c0000000020049bc] kernel_init_freeable+0x358/0x3ac > [c000000003aabdf0] [c0000000000111b0] kernel_init+0x30/0x1a0 > [c000000003aabe50] [c00000000000debc] ret_from_kernel_user_thread+0x14/0x1c > --- interrupt: 0 at 0x0 > > To avoid ever suppressing a valid pt_regs make sure the pt_regs has a > zero MSR and TRAP value, and is located at the very base of the stack. > > Reported-by: Joel Stanley <j...@jms.id.au> > Reported-by: Nicholas Piggin <npig...@gmail.com> > Signed-off-by: Michael Ellerman <m...@ellerman.id.au>
Reviewed-by: Joel Stanley <j...@jms.id.au> Thanks for the explanation and the patch. > --- > arch/powerpc/kernel/process.c | 26 +++++++++++++++++++++++--- > 1 file changed, 23 insertions(+), 3 deletions(-) > > diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c > index b68898ac07e1..392404688cec 100644 > --- a/arch/powerpc/kernel/process.c > +++ b/arch/powerpc/kernel/process.c > @@ -2258,6 +2258,22 @@ unsigned long __get_wchan(struct task_struct *p) > return ret; > } > > +static bool empty_user_regs(struct pt_regs *regs, struct task_struct *tsk) > +{ > + unsigned long stack_page; > + > + // A non-empty pt_regs should never have a zero MSR or TRAP value. > + if (regs->msr || regs->trap) > + return false; > + > + // Check it sits at the very base of the stack > + stack_page = (unsigned long)task_stack_page(tsk); > + if ((unsigned long)(regs + 1) != stack_page + THREAD_SIZE) > + return false; > + > + return true; > +} > + > static int kstack_depth_to_print = CONFIG_PRINT_STACK_DEPTH; > > void __no_sanitize_address show_stack(struct task_struct *tsk, > @@ -2322,9 +2338,13 @@ void __no_sanitize_address show_stack(struct > task_struct *tsk, > lr = regs->link; > printk("%s--- interrupt: %lx at %pS\n", > loglvl, regs->trap, (void *)regs->nip); > - __show_regs(regs); > - printk("%s--- interrupt: %lx\n", > - loglvl, regs->trap); > + > + // Detect the case of an empty pt_regs at the very > base > + // of the stack and suppress showing it in full. > + if (!empty_user_regs(regs, tsk)) { > + __show_regs(regs); > + printk("%s--- interrupt: %lx\n", loglvl, > regs->trap); > + } > > firstframe = 1; > } > -- > 2.41.0 >