On Thu, Mar 04, 2021 at 11:05:57AM -0800, Andy Lutomirski wrote: > For kernel threads, task_pt_regs is currently all zeros, a valid user state > (if kernel_execve() has been called), or some combination thereof during > execution of kernel_execve(). If a stack trace is printed, the unwinder > might get confused and treat task_pt_regs as a kernel state. Indeed, > forcing a stack dump results in an attempt to display _kernel_ code bytes > from a bogus address at the very top of kernel memory. > > Fix the confusion by setting cs=3 so that user_mode(task_pt_regs(...)) == > true for kernel threads. > > Also improve the error when failing to fetch code bytes to show which type > of fetch failed. This makes it much easier to understand what's happening. > > Cc: Josh Poimboeuf <jpoim...@redhat.com> > Signed-off-by: Andy Lutomirski <l...@kernel.org> > --- > arch/x86/kernel/dumpstack.c | 4 ++-- > arch/x86/kernel/process.c | 13 +++++++++++++ > 2 files changed, 15 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c > index 55cf3c8325c6..9b7b69bb12e5 100644 > --- a/arch/x86/kernel/dumpstack.c > +++ b/arch/x86/kernel/dumpstack.c > @@ -128,8 +128,8 @@ void show_opcodes(struct pt_regs *regs, const char > *loglvl) > /* No access to the user space stack of other tasks. Ignore. */ > break; > default: > - printk("%sCode: Unable to access opcode bytes at RIP 0x%lx.\n", > - loglvl, prologue); > + printk("%sCode: Unable to access %s opcode bytes at RIP > 0x%lx.\n", > + loglvl, user_mode(regs) ? "user" : "kernel", prologue); > break; > } > } > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c > index 145a7ac0c19a..f6f16df04cb9 100644 > --- a/arch/x86/kernel/process.c > +++ b/arch/x86/kernel/process.c > @@ -163,6 +163,19 @@ int copy_thread(unsigned long clone_flags, unsigned long > sp, unsigned long arg, > /* Kernel thread ? */ > if (unlikely(p->flags & PF_KTHREAD)) { > memset(childregs, 0, sizeof(struct pt_regs)); > + > + /* > + * Even though there is no real user state here, these > + * are were user regs belong, and kernel_execve() will ^^^^^ where?
Ira > + * overwrite them with user regs. Put an obviously > + * invalid value that nonetheless causes user_mode(regs) > + * to return true in CS. > + * > + * This also prevents the unwinder from thinking that there > + * is invalid kernel state at the top of the stack. > + */ > + childregs->cs = 3; > + > kthread_frame_init(frame, sp, arg); > return 0; > } > -- > 2.29.2 >