Re: RCU stall

Paul E. McKenney Tue, 22 Mar 2016 19:00:12 -0700

On Tue, Mar 22, 2016 at 04:53:26PM -0700, Bart Van Assche wrote:
> On 03/22/2016 01:45 PM, Paul E. McKenney wrote:
> >You are getting a soft lockup as well as an RCU CPU stall warning, so
> >it looks like something is taking a very long time in blk_done_softirq().
> >
> >You have multiple occurrences at different times, so it looks to be
> >a long time as opposed to an infinite time.  Are you perhaps doing
> >something that would make a huge amount of work for blk_done_softirq()?
> >
> >See Documentation/RCU/stallwarn.txt in the kernel source tree for more
> >info on how to debug this sort of thing.
> 
> Hello Paul,
> 
> None of the drivers involved in the test I ran contain RCU code that
> has been changed recently. The block and SCSI subsystems processes
> I/O completions in softirq context but until last week I hadn't seen
> any RCU lockup complaints when I ran an SRP test against a kernel
> with lockdep and several other kernel debugging options enabled.
> This is why I sent an e-mail to you. I have read
> Documentation/RCU/stallwarn.txt after I received your reply but this
> didn't provide me any clue about where to look for the root cause.
> Any further help would be appreciated.


My suggestion would be to check the block/SCSI softirq handler for
event traces.  If there are some, enable them and see what the loop
is doing.  Documentation/trace/ftrace.txt describes how to enable
existing event tracing.

If there is no event tracing, consider adding some in your local
view.  Failing that, there is always printk().  ;-)

Or perhaps you have some sort of debug setup.

Either way, the next step is to work out why that CPU is spending
so much time in that loop.

                                                        Thanx, Paul

Re: RCU stall

Reply via email to