On Wed 2016-10-26 15:55:00, Laura Abbott wrote: > Hi, > > I was playing around with overflowing stacks and I managed to generate a test > case that hung the kernel with vmapped stacks. The test case is just > > static void noinline foo1(void) > { > pr_info("%p\n", (void *)current_stack_pointer()); > foo2(); > } > > where foo$n is the same function with the name changed. I'm super > creative. I have a couple thousand of these for testing with the final > one doing a WARN. The kernel eventually hangs in printk on logbuf_lock > > (gdb) bt > #0 __read_once_size (size=<optimized out>, res=<optimized out>, p=<optimized > out>) > at ./include/linux/compiler.h:243 > #1 queued_spin_lock_slowpath (lock=0xffffffff82078e6c <logbuf_lock>, val=1) > at kernel/locking/qspinlock.c:478 > #2 0xffffffff8191611b in queued_spin_lock (lock=<optimized out>) > at ./include/asm-generic/qspinlock.h:103 > #3 do_raw_spin_lock (lock=<optimized out>) at ./include/linux/spinlock.h:148 > #4 __raw_spin_lock (lock=<optimized out>) > at ./include/linux/spinlock_api_smp.h:145 > #5 _raw_spin_lock (lock=<optimized out>) at kernel/locking/spinlock.c:151 > #6 0xffffffff810a4244 in vprintk_emit (facility=-2113434004, level=1, > dict=<optimized out>, dictlen=<optimized out>, > fmt=0x101 <irq_stack_union+257> <error: Cannot access memory at address > 0x101>, args=0xffff880011804eb0) at kernel/printk/printk.c:1835 > #7 0xffffffff810a476a in vprintk_default (fmt=<optimized out>, > args=<optimized out>) at kernel/printk/printk.c:1953 > #8 0xffffffff81128152 in vprintk_func (args=<optimized out>, fmt=<optimized > out>) > at kernel/printk/internal.h:36 > #9 printk (fmt=<optimized out>) at kernel/printk/printk.c:1986 > #10 0xffffffff8101d590 in handle_stack_overflow ( > message=0xffffffff81ba3560 "kernel stack overflow (double-fault)", > regs=0xffff880011804f58, fault_address=<optimized out>) > at arch/x86/kernel/traps.c:300 > #11 0xffffffff8101d67f in do_double_fault (regs=0xffff880011804f58, > error_code=0) > at arch/x86/kernel/traps.c:393 > #12 0xffffffff81917c32 in double_fault () at arch/x86/entry/entry_64.S:854 > #13 0xffffc90000178038 in ?? () > #14 0x0000000000ffff0a in ?? () > #15 0x0000000000000000 in ?? () > > handle_stack_overflow does > > printk(KERN_EMERG "BUG: stack guard page was hit at %p (stack is > %p..%p)\n", > (void *)fault_address, current->stack, > (char *)current->stack + THREAD_SIZE - 1); > die(message, regs, 0); > > so there is a printk before the die and bust_spinlocks there. Just doing a > bust_spinlock before the printk doesn't help though and if the printk is > removed > the kernel still hangs in the printk in __die > > gdb shows logbuf_cpu as unlocked > > (gdb) print /x logbuf_cpu > $1 = 0xffffffff > > and walking back up the stack it looks like this finally ran out of stack > space > in console_unlock from the end of vprintk_emit. console_unlock takes > logbuf_lock > but doesn't update logbuf_cpu to possibly check for recursion in a panic case, > probably because nobody every considered it would be possible to die there > before.
Yeah, logbuf_lock is taken on many locations but logbuf_cpu is set only in vprintk_emit(). It means that the other locations, including console_unlock() are not protected against this type of recursion. There is actually a whole bunch of possible printk-related deadlocks. There are several approaches how to handle some of them, for example: + printk_save(), see https://lkml.kernel.org/r/20161018154045.7364-1-sergey.senozhat...@gmail.com + async printk, see https://lkml.kernel.org/r/1459789048-1337-1-git-send-email-sergey.senozhat...@gmail.com + early console, see https://lkml.kernel.org/r/20161018170830.405990...@infradead.org The more we try to fix them, the more problems we see. Sergey probably has the best overview about it at the moment. We are going to discuss a possible progress on Plumbers next week. Best Regards, Petr