On Tue, Aug 25, 2015 at 7:40 PM, Peter Weilbacher <newss...@weilbacher.org> wrote: > Hi Alexander, > > On Sun, 23 Aug 2015, Alexander Kapshuk wrote: > >> On Sun, Aug 23, 2015 at 4:08 PM, Peter Weilbacher <newss...@weilbacher.org> >> wrote: >> > >> > after successfully using kernel 4.0.5 (vanilla-sources) for a while, I >> > upgraded to 4.1.5 last week and 4.1.6 today. I cannot boot either of >> > them. On the screen I see >> > >> > Decompressing Linux... Parsing ELF... done. >> > Booting the kernel. >> > >> > as the last thing, then it just sits there. >> >> I am running vanilla-sources 4.1.6, and so far I have not had any >> trouble booting it. >> >> Are you able to boot some of your previous kernels? If so, what does >> your '/boot/grub/grub.cfg' look like? >> What is the output of 'cat /etc/fstab' and 'ls -1 /boot'? > > I can still boot 4.0.5 fine, with the same setup. I use lilo, and I > checked that I changed the two/four digits correctly in /etc/lilo.conf. > > By chance I left the boot sit there for more than the typical minute, > and got multiple messages like > > INFO: rcu_sched self-detected stall on CPU { 3} (t=60000 jiffies g=-256 > c=-257 q=193) > rcu_sched kthread starved for 50027 jiffies! > > right after the above "Booting the kernel." line. > > Do I need to activate a different kind of clocking or a CPU feature in > 4.1.x? > > Peter. >
I've never experienced this particular kernel trouble myself, so I'm not sure if my input would be of much help. Here's what the kernel documentation has to say about this kind of issue: /usr/src/linux/Documentation/RCU/stallwarn.txt:29,33 CONFIG_RCU_CPU_STALL_INFO This kernel configuration parameter causes the stall warning to print out additional per-CPU diagnostic information, including information on scheduling-clock ticks and RCU's idle-CPU tracking. /usr/src/linux/Documentation/RCU/stallwarn.txt:104,109 If the CONFIG_RCU_CPU_STALL_INFO kernel configuration parameter is set, more information is printed with the stall-warning message, for example: INFO: rcu_preempt detected stall on CPU 0: (63959 ticks this GP) idle=241/3fffffffffffffff/0 softirq=82/543 (t=65000 jiffies) /usr/src/linux/Documentation/RCU/stallwarn.txt:240,250 To diagnose the cause of the stall, inspect the stack traces. The offending function will usually be near the top of the stack. If you have a series of stall warnings from a single extended stall, comparing the stack traces can often help determine where the stall is occurring, which will usually be in the function nearest the top of that portion of the stack which remains the same from trace to trace. If you can reliably trigger the stall, ftrace can be quite helpful. RCU bugs can often be debugged with the help of CONFIG_RCU_TRACE and with RCU's event tracing. For information on RCU's event tracing, see include/trace/events/rcu.h. Have a look for possibly stack traces in these log files: /var/log/{messages,dmesg}. Hopefully, someone else with more kernel debugging experience will have something more substantial to say about this.