On (Fri) 22 Aug 2014 [07:48:19], Paul E. McKenney wrote: > On Fri, Aug 22, 2014 at 06:26:49PM +0530, Amit Shah wrote: > > On (Fri) 22 Aug 2014 [18:06:51], Amit Shah wrote: > > > On (Fri) 22 Aug 2014 [17:54:53], Amit Shah wrote: > > > > On (Mon) 18 Aug 2014 [21:01:49], Paul E. McKenney wrote: > > > > > > > > > The odds are low over the next few days. I am adding nastier > > > > > rcutorture > > > > > testing, however. It would still be very good to get debug > > > > > information > > > > > from your setup. One approach would be to convert the trace function > > > > > calls into printk(), if that would help. > > > > > > > > I added a few printks on the lines of the traces in cases where > > > > rcu_nocb_poll was checked -- since that reproduces the hang. Are the > > > > following traces sufficient, or should I keep adding more printks? > > > > > > > > In the case of rcu-trace-nopoll.txt, the messages stop after a while > > > > (when the guest locks up hard). That's when I kill the qemu process. > > > > > > And this is bt from gdb when the endless > > > > > > RCUDEBUG __call_rcu_nocb_enqueue 2146 rcu_preempt 0 WakeNot > > > > > > messages are being spewed. > > > > > > I can't time it, but hope it gives some indication along with the printks. > > > > ... and after the system 'locks up', this is the state it's in: > > > > ^C > > Program received signal SIGINT, Interrupt. > > native_safe_halt () at ./arch/x86/include/asm/irqflags.h:50 > > 50 } > > (gdb) bt > > #0 native_safe_halt () at ./arch/x86/include/asm/irqflags.h:50 > > #1 0xffffffff8100b9c1 in arch_safe_halt () at > > ./arch/x86/include/asm/paravirt.h:111 > > #2 default_idle () at arch/x86/kernel/process.c:311 > > #3 0xffffffff8100c107 in arch_cpu_idle () at arch/x86/kernel/process.c:302 > > #4 0xffffffff8106a25a in cpuidle_idle_call () at kernel/sched/idle.c:120 > > #5 cpu_idle_loop () at kernel/sched/idle.c:220 > > #6 cpu_startup_entry (state=<optimized out>) at kernel/sched/idle.c:268 > > #7 0xffffffff813e068b in rest_init () at init/main.c:418 > > #8 0xffffffff81a8cf5a in start_kernel () at init/main.c:680 > > #9 0xffffffff81a8c4ba in x86_64_start_reservations > > (real_mode_data=<optimized out>) at arch/x86/kernel/head64.c:193 > > #10 0xffffffff81a8c607 in x86_64_start_kernel (real_mode_data=0x13f90 > > <cpu_lock_stats+29184> <error: Cannot access memory at address 0x13f90>) > > at arch/x86/kernel/head64.c:182 > > #11 0x0000000000000000 in ?? () > > > > > > Wondering why it's doing this. Am stepping through > > cpu_startup_entry() to see if I get any clues. > > This looks to me like normal behavior in the x86 ACPI idle loop. > My guess is that the lockup is caused by indefinite blocking, in > which case we would expect all the CPUs to be in the idle loop.
Hm, found it: The stall happens in do_initcalls(). pm_sysrq_init() is the function that causes the hang. When I #if 0 the line register_sysrq_key('o', &sysrq_poweroff_op); in pm_sysrq_init(), the boot proceeds normally. Now what this is, and what relation this has to rcu and that patch in particular is next... Amit -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/