On Tue, Aug 25, 2015 at 7:40 PM, Peter Weilbacher
<newss...@weilbacher.org> wrote:
> Hi Alexander,
>
> On Sun, 23 Aug 2015, Alexander Kapshuk wrote:
>
>> On Sun, Aug 23, 2015 at 4:08 PM, Peter Weilbacher <newss...@weilbacher.org> 
>> wrote:
>> >
>> > after successfully using kernel 4.0.5 (vanilla-sources) for a while, I
>> > upgraded to 4.1.5 last week and 4.1.6 today. I cannot boot either of
>> > them. On the screen I see
>> >
>> >    Decompressing Linux... Parsing ELF... done.
>> >    Booting the kernel.
>> >
>> > as the last thing, then it just sits there.
>>
>> I am running vanilla-sources 4.1.6, and so far I have not had any
>> trouble booting it.
>>
>> Are you able to boot some of your previous kernels? If so, what does
>> your '/boot/grub/grub.cfg' look like?
>> What is the output of 'cat /etc/fstab' and 'ls -1 /boot'?
>
> I can still boot 4.0.5 fine, with the same setup. I use lilo, and I
> checked that I changed the two/four digits correctly in /etc/lilo.conf.
>
> By chance I left the boot sit there for more than the typical minute,
> and got multiple messages like
>
>   INFO: rcu_sched self-detected stall on CPU { 3}  (t=60000 jiffies g=-256 
> c=-257 q=193)
>   rcu_sched kthread starved for 50027 jiffies!
>
> right after the above "Booting the kernel." line.
>
> Do I need to activate a different kind of clocking or a CPU feature in
> 4.1.x?
>
>    Peter.
>

I've never experienced this particular kernel trouble myself, so I'm
not sure if my input would be of much help.
Here's what the kernel documentation has to say about this kind of issue:

/usr/src/linux/Documentation/RCU/stallwarn.txt:29,33
CONFIG_RCU_CPU_STALL_INFO

    This kernel configuration parameter causes the stall warning to
    print out additional per-CPU diagnostic information, including
    information on scheduling-clock ticks and RCU's idle-CPU tracking.

/usr/src/linux/Documentation/RCU/stallwarn.txt:104,109
If the CONFIG_RCU_CPU_STALL_INFO kernel configuration parameter is set,
more information is printed with the stall-warning message, for example:

    INFO: rcu_preempt detected stall on CPU
    0: (63959 ticks this GP) idle=241/3fffffffffffffff/0 softirq=82/543
       (t=65000 jiffies)

/usr/src/linux/Documentation/RCU/stallwarn.txt:240,250
To diagnose the cause of the stall, inspect the stack traces.
The offending function will usually be near the top of the stack.
If you have a series of stall warnings from a single extended stall,
comparing the stack traces can often help determine where the stall
is occurring, which will usually be in the function nearest the top of
that portion of the stack which remains the same from trace to trace.
If you can reliably trigger the stall, ftrace can be quite helpful.

RCU bugs can often be debugged with the help of CONFIG_RCU_TRACE
and with RCU's event tracing.  For information on RCU's event tracing,
see include/trace/events/rcu.h.

Have a look for possibly stack traces in these log files:
/var/log/{messages,dmesg}.

Hopefully, someone else with more kernel debugging experience will
have something more substantial to say about this.

Reply via email to